Reputation: 2025
When filtering logs, Logstash may use grok to parse the received log file (let's say it is Nginx logs). Parsing with grok requires you to properly set the field type - e.g., %{HTTPDATE:timestamp}
.
However, if Nginx starts logging in JSON format then Logstash does very little processing. It simply creates the index, and outputs to Elasticseach. This leads me to believe that only Elasticsearch benefits from the "way" it receives the index.
Is there any advantage for Elasticseatch in having index data that was processed with Regex vs. JSON? E.g., Does it impact query time?
Upvotes: 0
Views: 310
Reputation: 7463
For elasticsearch it doesn't matter how you are parsing the messages, it has no information about it, you only need to send a JSON document with the fields that you want to store and search on according to your index mapping.
However, how you are parsing the message matters for Logstash, since it will impact directly in the performance.
For example, consider the following message:
2020-04-17 08:10:50,123 [26] INFO ApplicationName - LogMessage From The Application
If you want to be able to search and apply filters on each part of this message, you will need to parse it into fields.
timestamp: 2020-04-17 08:10:50,123
thread: 26
loglevel: INFO
application: ApplicationName
logmessage: LogMessage From The Application
To parse this message you can use different filters, one of them is grok
, which uses regex, but if your message has always the same format, you can use another filter, like dissect
, in this case both will achieve the same thing, but while grok
uses regex to match the fields, dissect
is only positional, this make a huge difference in CPU use when you have a high number of events per seconds.
Consider now that you have the same message, but in a JSON format.
{ "timestamp":"2020-04-17 08:10:50,123", "thread":26, "loglevel":"INFO", "application":"ApplicationName","logmessage":"LogMessage From The Application" }
It is easier and fast for logstash to parse this message, you can do it in your input
using the json
codec or you can use the json
filter in your filter block.
If you have control on how your log messages are created, choose something that will make you do not need to use grok
.
Upvotes: 1