Reputation: 2266
I have a little issue with fluend log parser. I have varnish server on which I have set up the X-Forwarded-For parameter to content the list of ip all the host stack a http request goes through. I use this to get information in varnishncsa logs. This is and example of log :
"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
In the oder hand I would like to aggregate these logs on fluentd. Then as vanishncsa logs use the apache format, I use the apache2 flentd format for input parsing, like in this configuration :
<source>
type tail
format apache2
path /var/log/varnish/varnishncsa.log
pos_file /var/log/td-agent/tmp/access.log.pos
tag "apache2.varnish.mydomain.com.access"
</source>
Now the problem is that this work when if I have only one host ip in the log, but when there multiple IPs, the fluentd agregator report a "pattern not match" warning. I mean
This matches :
"192.168.79.16 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
This doesn't match :
"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
The apache2 fluentd regex is :
^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
With this time format :
%d/%b/%Y:%H:%M:%S %z
I try to find out and text the right regx for that, but not found yet.
I tried this but, it doesn't work
<source>
type tail
format format /^(?<host>\,*[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
path /var/log/varnish/varnishncsa.log
pos_file /var/log/td-agent/tmp/access.log.pos
tag "apache2.varnish.mydomain.com.access"
</source>
Can someone help? And also give me a good documentaion on fluend parser pattern capturing, and a good way to the test fulentd regex. This Fluentd regular expression editor doesn't really help.
It always generate configuration, without giving a test result.
Thanks.
Upvotes: 1
Views: 3669
Reputation: 627469
Here is the regex you can use in case you have multiple IPs:
^(?<host>[^ ]*(?:,\s+[^ ]+)*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
^^^^^^^^^^^^^^
See demo on a good Web regex tester
The (?:,\s+[^ ]+)*
pattern matches 0 or more (*
) sequences of ,
, 1 or more whitespace (\s+
) symbols, and 1 or more characters other than space ([^ ]+
).
A bit safer expression will look like:
^(?<host>(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*|-) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
See Demo 2
The (?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*
matches number
+ .
+ number
+ .
+ number
+ .
+ number
, with optional identical patterns listed with a comma.
Upvotes: 2