The number of insights that are offered by any modern analytics platform is really invaluable and easy to use, but we tend to forget that even our server logs are an incredible source of information that with little or no effort we can query with a really simple script.
The idea of this article was from a request of analysing the IIS logs in real-time, but without changing the web application code or adding any javascript to google analytics (or similar platform), so my choice was using a bash script to analyse the logs.
Installing Splunk, SpectX, or other tools for this case would probably obtain a more elegant solution and with a lot of options, but for the specific request was not implemented but just mentioned.
Goal
I wanted to generate a top 10 list of IPs for each IIS log file.
Internet Information Services (IIS) Logs
It’s important to start from the way that IIS implements the W3C standard, more in detail from the official doc pages.
This is an IIS log file
1 2 3 4 5 |
Software: Microsoft Internet Information Services 10.0 #Version: 1.0 #Date: 2020-07-13 14:00:02 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken 2020-07-13 14:00:02 192.168.32.20 HEAD /favicon.ico - 80 - 192.168.31.5 Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Build/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/80.0.3987.92+Mobile+Safari/537.36+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 301 0 0 0 |
Looking at the first five lines of the sample log file we will notice a few things:
- This log is generated by Internet Information Services (version number)
- Date time is in UTC format so for me in living down under in Australia the Time Zone is AEST (Australia Est Standard Time, UTC +10) will be midnight.
- The 4th line is describing the fields
- The 5th line is my first entry.
This is a table with number-field :
# | Field |
1 | date |
2 | time |
3 | s-ip |
4 | cs-method |
5 | cs-uri-stem |
6 | cs-uri-query |
7 | s-port |
8 | cs-username |
9 | c-ip |
10 | cs(User-Agent) |
11 | cs(Referer) |
12 | sc-status |
13 | sc-substatus |
14 | sc-win32-status |
15 | time-taken |
Choosing a BASH script
I think that my choice was obvious because the combination of cut | sort | uniq | head is the answer to this problem. But the nice part of this is that Windows Subsystem For Linux (WSL) is bringing bash to Windows and can be called in-line.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#!/bin/bash # Paolo Frigo, https://www.scriptinglibrary.com #THIS SCRIPT RETURNS THE TOP N IP IN THE IIS LOGS FOLDER TOP=10 function find_top_ip () { IISLOG=$1 echo "This is the TOP $TOP for the $IISLOG log file" echo "--------------------------------------" echo " HIT | IP" echo "--------------------------------------" cat $IISLOG | cut -d ' ' -f 9 | sort | uniq -c | sort -r | head -n $TOP echo "--------------------------------------" } for f in *.log; do find_top_ip $f; done |
At line 12 the combination of cut, sort, uniq, head and how they can work together is maybe not obvious to everybody, but if you read the man page of these commands and experiment by inspecting the output adding one command after the other using the pipe symbol.
Line 16 is simply calling the find_top_ip function for each log file in the current directory.
I’ve copied this script on the IIS log folder, opened a WSL terminal on the log folder, then I’ve run the script and review the result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
paolo@wsl:/mnt/d/IIS$ ./top10.sh This is the TOP 10 for the u_ex200715.log log file -------------------------------------- HIT | IP -------------------------------------- 8923 192.168.31.10 1494 192.168.30.50 1493 192.168.30.126 1472 192.168.30.146 1470 192.168.30.163 1461 192.168.30.18 1454 192.168.32.5 1451 192.168.32.82 1448 192.168.31.5 1440 192.168.30.5 -------------------------------------- This is the TOP 10 for the u_ex200714.log file -------------------------------------- HIT | IP -------------------------------------- 21142 192.168.30.50 1164 192.168.30.18 940 192.168.30.239 361 192.168.31.12 216 192.168.31.11 144 192.168.32.68 138 192.168.31.206 129 192.168.30.5 90 192.168.31.5 79 192.168.32.5 -------------------------------------- |
Other ideas
- c-ip is field #9, but this script could be edited quickly to get the top 10 client username or browser fingerprints by looking at the right field.
- changing the ‘-f 9’ at line 12 to ‘-f 10’ will generate the top 10 browser agents for instance
Log Parser
It worth mentioning that there is also an interesting tool called Log Parser that you can use to run more advanced queries against your iis logs.
- Link: https://www.iis.net/downloads/community/2010/04/log-parser-22
- Doc: https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-xp/bb878032(v=technet.10)
Wrap-Up
I’ve chosen to create a script once again to get the job done, even if there are a lot of other alternatives or better solutions available. I would recommend alternative solutions for most advanced scenarios, but for this request, I think the simplicity of the script wins hands-down. As usual, this script is available on my GitHub repository.