Reputation: 935
I am trying to count up accesses per minute from apache logs that look like this
domain.com:10.10.10.10 - - [26/Mar/2014:14:14:12 +0000] "GET /online_catalogue/files/flash/libs/framework_4.6.0.23201.swz HTTP/1.0" 200 327044 "http://www.domain.com/online_catalogue/files/flash/flippingbook.swf?key=foobar" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
perl -ne '$a{$1}++ if /\[(.+?:[0-9]{2}:[0-9]{2})/; END { foreach $k(keys %a) { print "$k $a{$k}\n"; } }' logfile | sort
This works, but I want to avoid counting accesses against static files like swz, css, gif, png, jpg etc.
I tried altering the regex to
\[(.+?:[0-9]{2}:[0-9]{2}).+?(?:POST|GET) \/[^ ]+(?!\.swz|\.gif|\.css|\.jpg)
but this still matches. I want to avoid matching them all together.
Upvotes: 0
Views: 722
Reputation: 58521
The [^ ]+
is consuming the filenames, and then the negative look-ahead can be ignored.
Try adding another [^ ]
after the negative look-ahead to prevent matches including the entire filename...
\[(.+?:[0-9]{2}:[0-9]{2}).+?(?:POST|GET) \/[^ ]+(?!\.swz|\.gif|\.css|\.jpg)[^ ]
Upvotes: 0
Reputation: 12797
A little modification to your regex fixes the problem.
\[(.+?:[0-9]{2}:[0-9]{2}).+?(?:POST|GET) \/(?![^ ]+(\.swz|\.gif|\.css|\.jpg))[^ ]+
First we check that it's impossible to match *.swz, *.gif, ... after GET|POST and then capture the filename.
Upvotes: 1