Reputation: 1241
I have two related questions. First is how best to grok logs that have "messy" spacing and so on, and the second, which I'll ask separately, is how to deal with logs that have arbitrary attribute-value pairs. (See: logstash grok filter for logs with arbitrary attribute-value pairs )
So for the first question, I have a log line that looks like this:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
Using http://grokdebug.herokuapp.com/ I was able to eventually come up with the following grok pattern that works for this line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
With the following config file:
input {
file {
path => "/home/robyn/testlogs/trimmed_logs.txt"
start_position => beginning
sincedb_path => "/dev/null" # for testing; allows reparsing
}
}
filter {
grok {
match => {"message" => "%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}" }
}
}
output {
file {
path => "/home/robyn/filteredlogs/trimmed_logs.out.txt"
}
}
I get the following output:
{"message":"14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92","@version":"1","@timestamp":"2015-08-07 T17:55:16.529Z","host":"hlt-dev","path":"/home/robyn/testlogs/trimmed_logs.txt","timestamp":"14:46:16.603","http":"[http-nio-8080-exec-4]","loglevel":"INFO","logtype":"METERING","msg":"93e6dd5e-c009-46b3-b9eb-f753ee3b889a","action":"CREATE_JOB","job":"a820018e-7ad7-481a-97b0-bd705c3280ad","data":"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"}
That's pretty much what I want, but I feel like it's a really kludgy pattern, particularly with the need to use %{SPACE} and %{NOSPACE} so much. This suggests to me that I'm not really doing this the best possible way. Should I be creating a more specific pattern for the hex ids? I think I need the %{SPACE} between loglevel and logtype because of the extra space between INFO and METERING in the log, but that also feels kludgy.
Also how do I get the log's timestamp to replace the @timestamp that seems to be the time logstash ingested the log, which we don't want/need.
Obviously I'm just getting started with ELK and grok, so pointers to useful resources are also appreciated.
Upvotes: 2
Views: 4471
Reputation: 155
There is also the possibility to use \s* instead of the SPACE pattern.
For deleting fields you can use the mutate plugin there is a method called "remove_field" --> https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-remove_field
If you delete this field, you have to add a new index in kibana. Because kibana sorts events with the @timestamp field if nothing else is choosen.
Upvotes: 0
Reputation: 217274
There is an existing pattern you can use instead of NOTSPACE
, it's UUID
. Also when there's a single space, there's no need to use the SPACE
pattern, you can leave it out. I'm also using the USERNAME
pattern (maybe wrongly named) just for the sake of capturing the http
field.
So it would go like this and you only have a single SPACE
pattern to capture multiple spaces.
Sample log line:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
Grok pattern:
%{TIME:timestamp} \[%{USERNAME:http}\] %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{UUID:msg} %{WORD:action} job=%{UUID:job} data=%{UUID:data}
Grok will spit this out:
{
"timestamp": [
[
"14:46:16.603"
]
],
"HOUR": [
[
"14"
]
],
"MINUTE": [
[
"46"
]
],
"SECOND": [
[
"16.603"
]
],
"http": [
[
"http-nio-8080-exec-4"
]
],
"loglevel": [
[
"INFO"
]
],
"SPACE": [
[
" "
]
],
"logtype": [
[
"METERING"
]
],
"msg": [
[
"93e6dd5e-c009-46b3-b9eb-f753ee3b889a"
]
],
"action": [
[
"CREATE_JOB"
]
],
"job": [
[
"a820018e-7ad7-481a-97b0-bd705c3280ad"
]
],
"data": [
[
"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"
]
]
}
Upvotes: 3