bulkmoustache
bulkmoustache

Reputation: 2025

Filtering with regex vs json

When filtering logs, Logstash may use grok to parse the received log file (let's say it is Nginx logs). Parsing with grok requires you to properly set the field type - e.g., %{HTTPDATE:timestamp}.

However, if Nginx starts logging in JSON format then Logstash does very little processing. It simply creates the index, and outputs to Elasticseach. This leads me to believe that only Elasticsearch benefits from the "way" it receives the index.

Is there any advantage for Elasticseatch in having index data that was processed with Regex vs. JSON? E.g., Does it impact query time?

Upvotes: 0

Views: 310

Answers (1)

leandrojmp
leandrojmp

Reputation: 7463

For elasticsearch it doesn't matter how you are parsing the messages, it has no information about it, you only need to send a JSON document with the fields that you want to store and search on according to your index mapping.

However, how you are parsing the message matters for Logstash, since it will impact directly in the performance.

For example, consider the following message:

2020-04-17 08:10:50,123 [26] INFO ApplicationName - LogMessage From The Application

If you want to be able to search and apply filters on each part of this message, you will need to parse it into fields.

timestamp: 2020-04-17 08:10:50,123
thread: 26
loglevel: INFO
application: ApplicationName
logmessage: LogMessage From The Application

To parse this message you can use different filters, one of them is grok, which uses regex, but if your message has always the same format, you can use another filter, like dissect, in this case both will achieve the same thing, but while grok uses regex to match the fields, dissect is only positional, this make a huge difference in CPU use when you have a high number of events per seconds.

Consider now that you have the same message, but in a JSON format.

{ "timestamp":"2020-04-17 08:10:50,123", "thread":26, "loglevel":"INFO", "application":"ApplicationName","logmessage":"LogMessage From The Application" } 

It is easier and fast for logstash to parse this message, you can do it in your input using the json codec or you can use the json filter in your filter block.

If you have control on how your log messages are created, choose something that will make you do not need to use grok.

Upvotes: 1

Related Questions