Reputation: 17785
I have tomcat access logs that look like:
247.134.70.3 - - [07/May/2012:17:53:58 +0000] 93 "POST /maker/www/jsp/pulse/Service.ajax HTTP/1.1" 200 2 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)"
247.134.70.3 - - [07/May/2012:17:53:58 +0000] 140 "POST /maker/www/jsp/pulse/Service.ajax HTTP/1.1" 200 2 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)"
...
...
The digit after the datetime timestamp indicates the server side processing time ms. I need to the max and average time for a particular request e.g. "POST /maker/www/jsp/pulse/Service.ajax" for ten minute periods. Is this possible using sed, awk or do I need something with more logic e.g. python?
Thanks.
Upvotes: 1
Views: 428
Reputation: 246827
awk -F '[][ ]' -v start="07/May/2012:17:50:00" -v stop="07/May/2012:18:00:00" '
start <= $5 && $5 < stop {
split($0, arr, /"/)
req = arr[2]
count[req] += 1
total[req] += $8
if ($8 > max[req]) max[req] = $8
}
END {
for (req in count) {
print req
print " num entries: " count[req]
print " avg time: " total[req]/count[req]
print " max time: " max[req]
}
}
' logfile
Given your small sample, this outputs:
POST /maker/www/jsp/pulse/Service.ajax HTTP/1.1
num entries: 2
avg time: 116.5
max time: 140
Upvotes: 1
Reputation: 360103
awk -F '[ "]' '{requests[$8 $9] += $6; count[$8 $9]++; if ($6 > max[$8 $9]) {max[$8 $9] = $6}} END {for (request in requests) {print requests[request], count[request], requests[request] / count[request], max[request], request}}' inputfile
Broken out on separate lines:
awk -F '[ "]' '{
requests[$8 $9] += $6;
count[$8 $9]++;
if ($6 > max[$8 $9]) {
max[$8 $9] = $6
}
}
END {
for (request in requests) {
print requests[request], count[request], requests[request] / count[request], max[request], request
}
}' inputfile
Upvotes: 2