Reputation: 1582
I am trying to parse nginx logs using Logstash, everything looks fine, excepting getting this _grokparsefailure
tag with lines containing an Nginx $remote_user. When the $remote_user is '-'(the default value when no $remote_user specified), Logstash do the job, but with a real $remote_user like [email protected]
it fails and put a _grokparsefailure
tag:
127.0.0.1 - - [17/Feb/2017:23:14:08 +0100] "GET /favicon.ico HTTP/1.1" 302 169 "http://training-hub.tn/trainer/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
=====> Works fine
127.0.0.1 - [email protected] [17/Feb/2017:23:14:07 +0100] "GET /trainer/templates/home.tmpl.html HTTP/1.1" 304 0 "http://training-hub.tn/trainer/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
=====>_grokparsefailure
tag and fail to parse log line
I am using this configuration file:
input {
file {
path => "/home/dev/node/training-hub/logs/access_log"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
type => "logs"
}
}
filter {
if[type] == "logs" {
mutate {
gsub => ["message", "::ffff:", ""]
}
grok {
match=> [
"message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
"message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
]
overwrite=> [ "message" ]
}
mutate {
convert=> ["response", "integer"]
convert=> ["bytes", "integer"]
convert=> ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
date {
match=> [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field=> [ "timestamp" ]
}
useragent {
source=> "agent"
}
}
}
output { elasticsearch { hosts => "localhost:9200" } }
Upvotes: 1
Views: 750
Reputation: 1582
After testing the output with many values, I realized that Logstash fails to parse log lines containing such $remote_user
because it's not a valid username(an email address) so I've added a mutate gsub
filter to remove the @ and the rest of the mail address to have a valid $remote_user
.
gsub => ["message", "@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)]) [", " ["]
And now, it works fine
Upvotes: 0