Optional fields when matching log file rows using regex

Question

I'm trying to parse a web log with regular expressions using RegexSerDe. It works by matching each regex group with a column in a table and if the regex group is empty it assigns a null to that column.

I'm having trouble matching log rows with missing fields. There are two kinds of rows in this log:

<134>2016-10-23T23:59:59Z cache-iad2134 fastly[502801]: 52.55.94.131 "-" "-" Sun, 23 Oct 2016 23:59:59 GMT GET /apps/events/2016/10/11/3062653/?REC_ID=3062653&id=0 200

<134>2016-10-23T23:59:59Z cache-dfw1835 fastly[502801]: 1477267199

I wrote the below regex that matches the first type of row with all fields:

^(\S+) (\S+) (\S+) (\S+) "(\S+)" "(\S+)" (.*) (\d{3})

But I played around with ? to get the regex to optionally ignore the fields after the first 4 but kept messing up the columns.

Any suggestions on how I should add the ? without changing the number of groups (so that the deserializer doesn't cough up)? Or any other way to do this you would suggest?

Optional fields when matching log file rows using regex

Answers (1)

Related Questions