Monitoring is an important activity in IT operations, it’s essential for correlating the state of all the moving parts of our systems and applications and create a big picture of the health of the whole environment. Before going down the rabbit hole of complicated monitoring tools and techniques let’s start with define a that monitoring can be subjective and on a case-by-case can be very basic or detailed and can let you choose a specific tool or strategy. There is no one-size-fits-all. This week I needed to implement a custom check to monitor the network load/usage on any Windows OS and instead of looking for a third-party tool and deploying maybe another agent on servers I wrote a Powershell script to perform this activity.
I like to invest time and effort into monitoring especially for on-premises environments. In fact, in these environments, I think it’s more common to find legacy systems or delicate integration between different software solutions and it may need extra-care to support it properly. By contrast, all public/private cloud providers offer infrastructure monitoring as part of the offering, for them is builtin in the platform and it’s required if they want to bill you by the minute. In my ideal world, I’d like to have similar visibility and granular detail, even if some of these efforts can look overkilled in most of my use cases.
NSCP
What I frequently use even on-prem or on cloud infrastructure is NagiOS, I use it for Linux and Windows and Network device. The Nagios server may require agents if you want to perform white-box monitoring (or inside the box). For Windows OS the Nscp++ agent is capable of monitoring most of the things I need with a little bit of help from PowerShell or batch script or any executable, but the network load on the network interface card is not there out-of-the-box.
In the last 3 years, I’ve created all sorts of custom checks that are specific to the operating system, physical machine or the application installed on the host, but the network load on VMs it was something that I never needed before.
Typeperf
I started to have a look at Typeperf
1 2 3 4 5 6 7 8 9 10 |
PS D:\> typeperf "\Network Interface(*)\Bytes Total/sec" "(PDH-CSV 4.0)","\\MYWS\Network Interface(Realtek PCIe GbE Family Controller)\Bytes Total/sec","\\MYWS\Network Interface(Intel[R] PRO_1000 MT Desktop Adapter)\Bytes Total/sec" "02/06/2020 21:58:00.053","34466.207676","0.000000" "02/06/2020 21:58:01.055","13841.420713","0.000000" "02/06/2020 21:58:02.059","6039.413770","0.000000" "02/06/2020 21:58:03.062","6070.615071","0.000000" "02/06/2020 21:58:04.066","6891.797387","0.000000" "02/06/2020 21:58:05.068","7167.699324","0.000000" "02/06/2020 21:58:06.071","27653.817318","0.000000" |
Get-Counter
but I preferred to use Get-Counter to have the performance of the Network Card (s).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
PS D:\> get-counter Timestamp CounterSamples --------- -------------- 6/02/2020 9:59:09 PM \\MYWS\network interface(realtek pcie gbe family controller)\bytes total/sec : 10496.1209792292 \\MYWS\network interface(intel[r] pro_1000 mt desktop adapter)\bytes total/sec : 0 \\MYWS\network interface(teredo tunneling pseudo-interface)\bytes total/sec : 0 \\MYWS\processor(_total)\% processor time : 5.62432829480601 \\MYWS\memory\% committed bytes in use : 79.194275727308 \\MYWS\memory\cache faults/sec : 0.998679446168332 \\MYWS\physicaldisk(_total)\% disk time : 0 \\MYWS\physicaldisk(_total)\current disk queue length : 0 |
I’ve expanded the value to have more details
1 2 3 4 5 6 7 8 9 10 11 |
PS D:\> (get-counter).countersamples Path InstanceName CookedValue ---- ------------ ----------- \\MYWS\network interface(realtek pcie gbe family controller)\bytes total/sec realtek pcie gbe family controller 8555.78302677938 \\MYWS\network interface(intel[r] pro_1000 mt desktop adapter)\bytes total/sec intel[r] pro_1000 mt desktop adapter 0 \\MYWS\processor(_total)\% processor time _total 4.56429085701864 \\MYWS\memory\% committed bytes in use 79.2048703132209 \\MYWS\memory\cache faults/sec 0.997875323860436 \\MYWS\physicaldisk(_total)\% disk time _total 0.0537194780225807 \\MYWS\physicaldisk(_total)\current disk queue length _total 0 |
On my workstation I have 2 network interfaces, but for the point of view of any VM is on MS Hyper-V that I wanted to monitor the NIC will be always called “microsoft hyper-v network adapter”.
I’ve then followed the practice of creating a custom check and a macro in Nagios, where the state OK/WARNING/CRITICAL is determined by the exit value (0,1,2) and the message is from the output of the command.
Powershell Script
This is really a basic script that could be modified to accept network card manufacturer and warning and critical thresholds as parameters but in my simple case was not required and made the script easier to read and to understand for this article as well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#Nagios/NSCP Network Interface Card Load Check #Author: Paolo Frigo, https://www.scriptinglibrary.com $NETWORK_CARD ="microsoft hyper-v network adapter*" $WARNING = 29 $CRITICAL = 49 $TransferRate = ((get-counter).countersamples | where-object {$_.instancename -like "$NETWORK_CARD"} | select-object -exp CookedValue )*8 $NetworkUtilisation = [math]::round($TransferRate/1000000000*100,2) if ($NetworkUtilisation -gt $CRITICAL){ Write-Output "CRITICAL: $($NetworkUtilisation) % Network utilisation, $($TransferRate.ToString('N0')) b/s" exit 2 } if ($NetworkUtilisation -gt $WARNING){ Write-Output "WARNING: $($NetworkUtilisation) % Network utilisation, $($TransferRate.ToString('N0')) b/s" exit 1 } Write-Output "OK: $($NetworkUtilisation) % Network utilisation, $($TransferRate.ToString('N0')) b/s" exit 0 |
Depending on your use case it can be useful by itself even without a monitoring tool like Nagios. This would be the output:
1 2 |
PS C:\Program Files\NSClient++\scripts> .\check-nic.ps1 OK: 0 % Network utilisation, 28,196 b/s |
I’ve added the percentage and the transfer rate in b/s with separators to make it easier to read.
I’ve deployed the script on the scripts folder (usually C:\Program Files\NSClient++\scripts) and added this to the nsclient.ini file and restarted the nscp service.
1 2 3 |
[/settings/external scripts/scripts] ; Check for Nic Usage check_nic= cmd /c echo scripts\\check-nic.ps1; exit($LastExitCode) | powershell.exe -command - |
On my the Nagios servers, I’ve created the nrpe check for the target servers using the new custom check_nic command.
The powershell script is available on the GitHub repository as usual. Happy network monitoring!
Errata Corrige
After publishing this article, I found out in the documentation of Nagios that I was wrong. The agent via check_nt offers out-of-the-box access to the performance counters so there is no need to have a Powershell script for performing these types of checks, so you can define a macro similar to this example.
1 |
check_nt -H IP_ADDRESS_OR_FQDN -p 12489 -v COUNTER -l "\Network Interface(microsoft hyper-v network adapter)\Bytes Total/sec" -w 3000 -c 4900 |
One Reply to “Monitoring the Network Load with Powershell”