Reputation:
I've been trying to create a bash script to create a CSV file with every IP from an Apache access log, and count how many unique requests that IP made, along with the actual requests.
So far I have this:
#!/bin/bash
# Print the headers to the CSV file
printf "\tRequests\tIP\t\n" > memory.csv
# Create a text file named .access_log.tmp.2 with the IPs and how many requests they made in total - .access.log.tmp is the Apache access log in this case
awk '{ print $1 }' .access_log.tmp | sort -n | uniq -c | sort -nr | head -20 > ".access_log.tmp.2"
# Make it a CSV file
sed 's/[[:space:]]\+/;/g' .access_log.tmp.2 >> memory.csv
# Remove the leftover files
rm .access_log.tmp .access_log.tmp.2
That gives an output like this:
Requests IP
20 10.0.0.1
15 10.0.0.2
This is how I would want it to look like:
IP Requests
10.0.0.1 12 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
8 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.2 13 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
2 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
etc.
I have no clue where to go from now.
Can someone please help?
Edit: Adding the input and output files below, as requested:
What I have right now
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
^ Input
Requests IP
9 10.0.0.6
5 10.0.0.7
^ Output
What I want to have
Input is the same
IP Requests
10.0.0.6 3 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
3 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
3 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.7 1 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
^ Output
Upvotes: 1
Views: 1179
Reputation: 4865
Here is another shortened awk
solution (standard Linux gawk
).
One file sweep, sort once, no string substitutions, reduced to only 3 fields.
BEGIN {FS="( -)|(] \")"} # define field separator " -" or "] "
{ # read each input line
ipLogsArr[$1,$4]++; # store array counting appearance IP+Log combination
ipArr[$1]++; # store array counting appearance of IP
ipLogsArrVal[$1,$4]=sprintf("%s&&&%03d&&&%s", $1, ipLogsArr[$1,$4], $4); # store array of IP+count+Log combination
}
END { # post processing after reading all input
printf("%-14s %3s %s\n", "IP", "#", "log"); # output header
count = asort(ipLogsArrVal); # sort array of IP+count+Log combination
for (i = count; i >= 1; i--) { # for each element of the sorted array, iterate backward
split(ipLogsArrVal[i],arr,"&&&"); # separate IP+count+Log to into array arr
ipOut = (currIp == arr[1]) ? "" : arr[1]; # ignore printed IP
printf("%-14s %3d %s\n", ipOut, arr[2], arr[3]); # print current log
currIp = arr[1]; # remember current IP, in order to prevent repeated output
}
}
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
awk -f script.awk output.txt
IP # log
10.0.0.7 1 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
1 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 3 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
3 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
2 GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
Upvotes: 0
Reputation: 14452
Two possible path: single awk
program, or combining sort/uniq/awk pipeline. Second is simpler to write:
cat input |
awk '{ $2 = $3 = $4 = $5 = "" ; print }' |
sort |
uniq -c |
sort -k2.2nr -k1.1 |
awk '
{
printf "%-20s %d", $2 != p ? $2 : "", $1 ;
p=$2 ; for (i=3 ; i<=NF ; i++) printf " %s", $i ;
printf "\n"
}'
The alternative pure awk
solution is much longer: Run with prog.awk < input
#! /usr/bin/awk -f
{
ip = $1
body = $6
for (i=7 ; i<=NF ; i++) body = body " " $i
n[ip, body]++
}
function sort_id_count(i1, v1, i2, v2)
{
ip1 = substr(v1, 1, index(v1, SUBSEP))
ip2 = substr(v2, 1, index(v2, SUBSEP))
if ( ip1 < ip2 ) return -1
if ( ip1 > ip2 ) return +1 ;
# Descending freq
return n[v2]-n[v1]
}
BEGIN { OFS="," }
END {
na=0
for (k in n) a[++na] = k ;
asorti(a, ai, "sort_id_count") ;
p="" ;
for (ki in ai) {
k1 = ai[ki]
k2 = a[k1]
ip = substr(k2, 1, index(k2, SUBSEP)-1)
body = substr(k2, index(k2, SUBSEP)+1)
if ( ip == p ) ip = "" ; else p=ip ;
printf "%-20s %d %s\n", ip, n[k2], body
}
}
Upvotes: 1