Ananda
Ananda

Reputation: 1572

Error in Parsing Apache Log with RegEX?

I am parsing following apache log entry

59.167.203.103 - - [28/May/2013:03:12:47 +0000] "POST /some/some.htm HTTP/1.1" 200 1187 "-" "xyzf/2.00.16 xyzNetwork/609.1.4 xyzwin/13.0.0"

with given below RegEx and its working fine.

String logentrypattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"";

But in few entries responsebytes are "-" instead of some value, this is giving me following erorr and saying unable to parse. plz help

Bad log entry (or problem with RE?):
89.178.46.54 - - [24/May/2013:17:04:59 +0000] "PUT /xyz-pmp/xyz-pmp.htm HTTP/1.1" 200 - "-" "kdm/1.0"

Upvotes: 1

Views: 290

Answers (1)

Jerry
Jerry

Reputation: 71538

You could try this:

^([\\d.]+) (\\S+) (\\S+) \\[([\\w:\/]+\\s[+\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|-) \"([^\"]+)\" \"([^\"]+)\"
                                                                                 ^^

I added the bit where you can have a dash. Maybe it'd be better for you to have a \\S+ block instead there? Well, it'll all depend on what you're doing exactly. If the intent is to accept only the entries with digits, then your regex is working as intended. If it's just to capture the different parts of the entries, make sure you know the structure of the data and the different forms they can come to you.

Upvotes: 1

Related Questions