Reputation: 1572
I am parsing following apache log entry
59.167.203.103 - - [28/May/2013:03:12:47 +0000] "POST /some/some.htm HTTP/1.1" 200 1187 "-" "xyzf/2.00.16 xyzNetwork/609.1.4 xyzwin/13.0.0"
with given below RegEx and its working fine.
String logentrypattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"";
But in few entries responsebytes are "-" instead of some value, this is giving me following erorr and saying unable to parse. plz help
Bad log entry (or problem with RE?):
89.178.46.54 - - [24/May/2013:17:04:59 +0000] "PUT /xyz-pmp/xyz-pmp.htm HTTP/1.1" 200 - "-" "kdm/1.0"
Upvotes: 1
Views: 290
Reputation: 71538
You could try this:
^([\\d.]+) (\\S+) (\\S+) \\[([\\w:\/]+\\s[+\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|-) \"([^\"]+)\" \"([^\"]+)\"
^^
I added the bit where you can have a dash. Maybe it'd be better for you to have a \\S+
block instead there? Well, it'll all depend on what you're doing exactly. If the intent is to accept only the entries with digits, then your regex is working as intended. If it's just to capture the different parts of the entries, make sure you know the structure of the data and the different forms they can come to you.
Upvotes: 1