Stefano Lazzaro
Stefano Lazzaro

Reputation: 487

Regex parsing custom Apache log with added field after "size" field

I'm using a regex parser to parse Apache log lines in standard format plus an added field between 'size' and 'referer' that I'll call 'elapsed'. I'm not able to parse correctly the fields after 'elapsed' field. Can you help?

Try it on rubular

Line to parse:

10.1.1.1 - - [16/Nov/2022:15:34:38 +0000] "GET /server-status HTTP/1.1" 200 32 0 "-" "kube-probe/1.21"

Regex:

^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) (?<elapsed>[^ ]*) (?: "(?<referer>[^\"]*)" "(?<agent>.*)")?

Result:

host    10.121.17.62
user    -
time    16/Nov/2022:15:34:38 +0000
method  GET
path    /server-status
code    200
size    32057
elapsed 0
referer  
agent

Expected result:


host    10.121.17.62
user    -
time    16/Nov/2022:15:34:38 +0000
method  GET
path    /server-status
code    200
size    32057
elapsed 0
referer -    
agent   kube-probe/1.21

Upvotes: 1

Views: 63

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

The space before the optional non-capturing group ((?: "(?<referer>[^\"]*)" "(?<agent>.*)")?) must be removed, and it will solve the problem.

However, I would recomment using

^(?<host>\S+) \S+ (?<user>\S+) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>\S+) +\S*)?" (?<code>\S+) (?<size>\S+) (?<elapsed>\S+)(?: "(?<referer>[^"]*)" "(?<agent>.*)")?

See the regex demo. Here, I replaced [^ ]* with \S+ since it is what is meant, match one or more non-whitespace chars.

Upvotes: 1

Related Questions