As a DevOps, most of my energies are constantly focused on doing or fixing things right from the start with the intention of creating a simple or at least straightforward processes. But this is not an article on my endless war against technical debt and avoiding shortcuts whenever it’s possible.
the more things change, the more they stay the same
The values and the culture inherited from ITIL, Agile and DevOps massively influenced the whole modern software life cycle. CI/CD extended to code quality/testing/delivery etc.. In these years should have changed most of our daily tasks or our business-as-usual, right? Well… not yet!
Full-Stack : Troubleshooting challenges
IT systems are getting more complex, more distributed and more heterogeneous and isolated than ever! Consequently, what it makes the difference is the technical experience needed to troubleshoot them and not just to design them properly.
People who are knowledgeable not simply aware of all the stack of practices (ops and dev) and technologies are never enough, in this day and age. This is what I think as far as my work experience goes.
Let’s discuss a scenario that can happen to each one of us working in IT like, for example, an application that is not working as expected.
Most of the modern technologies aim to avoid this problem with the right design, implementing configuration management (infrastructure-as-code) and leveraging a declarative description of our system by fixing with idempotency the drifted environment from the desired state or simply push/revert new changes. 12-factor app to its core principles tries to decouple and create microservices like building blocks with containers technologies that in theory are immutable and stateless so whenever are not healthy anymore we can destroy them and start a new one or rebuild a new image if needed.
But how can we measure, monitor, test, validate and troubleshoot our systems? In my experience tools by themselves are never the solution, but it’s the process that we follow that to implement the right tools in meaningful ways.
As Devs our expectations are to create and extend systems to offers new functionalities as Ops we want that our system is robust and performs as expected. How can we get not just be influenced by impressions, but supported by metrics? That’s why we need to work together.
Using PowerShell and a mix of .Net or Python if needed!
In my opinion, that’s where PowerShell with a large number of cmdlets plus .Net Framework always available to support a lot of other functionalities or Python with the rich standard library and thousands of packages available externally with pip can let us cover a lot of ground. So knowing both it’s very important at least for me.
Whenever there is a new deployment there are always things that maybe are not configured correctly or behaving as expected.
Servers running Windows core (with no desktop experience), Containers don’t allow a user to gather that information via a graphical user interface. In a Windows environment, Powershell should be in most cases the obvious choice. Especially if the environment is a higher one such Production a (digitally) signed PowerShell (and well tested) script is probably the only script that I would prefer or trust to run in any case unattended or not… to asses if there is a problem or apply changes or workarounds, especially during after-hours shifts or with less experienced engineers who run scripts without fully understanding the code.
More than simply a command prompt
Let’s imagine that we have just a PowerShell session to that environment:
1 |
enter-pssession -computername Test-Sever |
The usual information available from our old cmd prompt it’s always available just to name a few:
- systeminfo
- ipconfig
- nslookup
- netdom
- getmac
- hostname
- certutil
- net
- netsh
- ping
- tracert
But to manage the output of these commands you need to use our dear Regular Expressions and the old adage is
When confronted with a problem, you think “I know, I’ll use regular expressions.” Now they have two problems.
Powershell it’s capable of managing objects not just to parse text input or output, that’s the beauty of it. Right? Let’s make the assumptions that we think that we know that one of our app server MyAppServer01 has a host firewall misconfigured preventing any connection from our clients to that app server to port 12345. Just as a test we want to find out that the host windows firewall it’s the probable root cause.
Let’s test if the server is reachable (like a ping test if the ICMP protocol is enabled).
1 2 3 4 5 6 7 8 9 |
PS> Test-Connection -computer name "MyAppServer01" Source Destination IPV4Address IPV6Address Bytes Time(ms) ------ ----------- ----------- ----------- ----- -------- MYPC MyAppServer01 10.0.0.2 32 0 MYPC MyAppServer01 10.0.0.2 32 0 MYPC MyAppServer01 10.0.0.2 32 1 MYPC MyAppServer01 10.0.0.2 32 0 |
Test if on the server port for instance “12345” it’s open using TCP.
1 2 3 4 5 6 7 8 9 10 11 |
PS> Test-Netconnection -computername "MyAppServer01" -port 12345 WARNING: TCP connect to (10.0.0.2 : 12345) failed ComputerName : MyAppServer01 RemoteAddress : 10.0.0.2 RemotePort : 12345 InterfaceAlias : vEthernet (Virtual Switch) SourceAddress : 10.0.0.20 PingSucceeded : True PingReplyDetails (RTT) : 0 ms TcpTestSucceeded : False |
Port 12345 on the host is Closed (TcpTestSucceed = False).
If we think that the firewall it’s the issue why we can’t reach that port. Let’s see if there is a host firewall up.
1 2 3 4 5 6 7 |
invoke-command -computername "MyAppServer01"-scriptblock {Get-NetFirewallProfile | Select-Object name, enabled|format-table} name Enabled ---- ------- Domain True Private True Public True |
Ok, the Windows Firewall it’s up. Let’s try to disable temporarily. To see if that the cause.
1 2 3 4 5 6 7 |
PS> Invoke-command -computername "MyAppServer01" -scriptblock {Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False; Get-NetFirewallProfile | Select-Object name, enabled|format-table} name Enabled ---- ------- Domain False Private False Public False |
Let’s test it again… It now we can reach that port
1 2 3 4 5 6 7 8 |
PS> Test-Netconnection -computername "MyAppServer01" -port 12345 ComputerName : MyAppServer01 RemoteAddress : 10.0.0.2 RemotePort : 12345 InterfaceAlias : vEthernet (Virtual Switch) SourceAddress : 10.0.0.20 TcpTestSucceeded : True |
Yes, It’s was the firewall blocking. Now we can immediately revert the firewall state to bring it back up.
1 2 3 4 5 6 7 |
PS> Invoke-command -computername "MyAppServer01" -scriptblock {Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled True; Get-NetFirewallProfile | Select-Object name, enabled|format-table} name Enabled ---- ------- Domain True Private True Public True |
And then start to fix the root cause like creating a firewall rule for allowing inbound traffic to port 12345 that we need to use.
Pester as help in troubleshooting
What is a good idea instead of chaining a series of tests in a single PowerShell script it’s starting to use PESTER . The main reason to write pester script it’s to be sure that the code we run should be tested before running and that we can review it, put it under version control and sign it to be able to run it within environments where security is enforced. Pester scripts are in general very elegant and easier to read. Share them between Dev and Ops it’s always a good opportunity for both to learn what it’s available from both parties.
Creating this “Troubleshooting.Tests.ps1”
1 2 3 4 5 6 7 8 9 10 11 12 |
$MyServer = "MyAppServer" $MyPort = 12345 Describe "My Usual troubleshooting tests for $($MyServer)" { It "My App Server $($MyServer) should be reachable from my workstation" { Test-Connection -ComputerName $MyServer -Count 1 -quiet | Should be $True } It "On $($MyServer) Port $($MyPort) should be open "{ {(Test-NetConnection -ComputerName $MyServer -port $MyPort).TestTCPSucceeded} | Should Be $true } } |
And running these test will have this output:
1 2 3 4 5 6 7 8 |
PS > Invoke-Pester .\troubleshooting.tests.ps1 Describing My Usual troubleshooting tests for MyAppServer01 [+] My App Server MyAppServer01 should be reachable from my workstation 330ms [+] On MyAppServer01 Port 12345 should be open 15ms Tests completed in 345ms Passed: 2 Failed: 0 Skipped: 0 Pending: 0 Inconclusive: 0 |
Creating this Unit Tests, Integration tests, Acceptance test automated will make it deployments and testing fail fast and your job easier.
Following the same idea let’s think of all the test that we can run on MyAppServer01. For example. We need a specific service running, the local port open and a firewall rule that enables the inbound request to land on that port and avoid packets to be dropped/filtered.
Let’s connect to that server with a PSSession.
1 |
enter-pssession -computername MyAppServer01 |
Let’s check if the service (myapp) is running:
1 2 3 4 5 |
PS > Get-service myapp| select-object Name,Status Name Status ---- ------ myapp Running |
Then let see if there is a process listening on that port:
1 2 3 4 5 6 |
Get-NetTCPConnection -localport 3389 |select-object localport, state localport State --------- ----- 12345 Listen 12345 Listen |
Let’s check if there is a firewall rule that allows traffic on that port 12345
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#requires -runasadministrator Get-NetFirewallPortFilter –Protocol TCP | Where { $_.localport –eq ‘3389’ } | Get-NetFirewallRule Name : MyApp-In-TCP DisplayName : MyApp (TCP-In) Description : Inbound rule for MyApp to allow MyApp traffic. [TCP 12345] DisplayGroup : MyApp Group : @FirewallAPI.dll,-28752 Enabled : True Profile : Domain, Private, Public Platform : {} Direction : Inbound Action : Allow EdgeTraversalPolicy : Block LooseSourceMapping : False LocalOnlyMapping : False Owner : PrimaryStatus : OK Status : The rule was parsed successfully from the store. (65536) EnforcementStatus : NotApplicable PolicyStoreSource : PersistentStore PolicyStoreSourceType : Local |
And these lines of code could be summarized to a (pester) PowerShell script as well.
1 2 3 4 5 6 7 8 9 10 11 12 |
Describe "Test on my MyAppServer01" { It "MyApp Service should be running" { (Get-service MyApp).Status | Should be "Running" } It "MyApp should be listening on port 12345"{ (Get-NetTCPConnection -localport 3389).state | Should be "Listen" } It "The Firewall rule for MyApp should be enabled"{ ( Get-NetFirewallRule "MyApp-In-TCP").Enabled | Should be "True" } } |
With these results:
1 2 3 4 5 6 7 8 9 |
PS > Invoke-pester .\troubleshooting2.tests.ps1 Describing Test on my MyAppServer01 [+] MyApp Service should be running 33ms [+] MyApp should be listening on port 12345 850ms [+] The Firewall rule for MyApp should be enabled 1.34s Tests completed in 2.22s Passed: 3 Failed: 0 Skipped: 0 Pending: 0 Inconclusive: 0 |
There can be environment variables, files, logs, certificates to check and so on and so forth..
This capability of gathering the just information you need and avoiding noise, it allows you to create an (imperative) script to check the requirements before deploying your application or at the end of the deployment to validate it.
How did this fit with containers?
Well, in a context of a containerized deployments of your web application of a .net app a PowerShell a very simple health check (Every 30s) for your dockerfile can look similar to this:
1 2 3 4 5 6 7 |
HEALTHCHECK --interval=30s ` CMD powershell -command ` try { ` $response = Invoke-WebRequest http://localhost/myapphealthcheck -UseBasicParsing; ` if ($response.StatusCode -eq 200) { return 0} ` else {return 1}; ` } catch { return 1 } |
From Troubleshooting to Performance Tuning
The close we work Dev and Ops more refined can be the metrics that we can gather. For instance, Devs could expose a JSON file on an health page like this http://localhost/myapphealthcheck. Ops can gather metrics and compare different response time, aggregated metrics or correlation between services or other dependencies and consolidate/parsing various log files.
Troubleshooting skills will overlap between across roles (devs and ops) and breaking silos will offer room for improvement and will possible turn into performance optimisation opportunities.
In my experience, fixing bottlenecks it’s the only way to deliver tangible improvements and to achieve long lasting results.
Wrap-Up
Troubleshooting it requires knowledge of the solution, the infrastructure and attention to details. There is no room for improvisation. The best that we can do is reuse what we learnt to prevent it from occurring again using that knowledge to develop, deliver and deploy a better solution and feed it back into the development pipeline as continuous improvement. So the way we document and measure this, it’s by writing code that we can reuse, share and improve over time the whole software life cycle.
One Reply to “Application Deployment Troubleshooting with Powershell”