sruthi
sruthi

Reputation: 91

How does the grok filter work in logstash

I am writing a Logstash configuration file.

I have a grok filter. I would like to know how the match in the grok filter works exactly.

I referred to one example in the logstash side and saw the following:

Ex log: 55.3.244.1 GET /index.html 15824 0.043
It is parsed with the filter below:

filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
  }

This means we are trying to match the whole log line sequentially? My logs lines are different. They are not always in a proper framework.
Its like the ones below:

 1. 11:10:15---somedata
 2. 11:10:20---source--destination-- somedata
 3. somedata

I would want to capture all three types lines So should I write different match filters? or is it fine to capture source, destination , somedata fields separately in a sigle match?

Seeking for information on this.

yes i do understand the basics of regex and the grok patterns.But I am still confused on how i can write match block for the following.

line 1: timestamp source destination a=0,b=1,c=3,d=4
line 2: timestamp a=1,e=5, b=1
line 3: g=0

suppose i have these 3 lines in my log file and i would want to capture lines that have the value for b and g. What would be my match block look like?

match => message ["b=":variable_b,"g=":variable_g]

Will this capture all the lines with b and g?? for b it should capture 1 and 2 lines. for g it has to capture 3. So my output should have all the three lines?? Is this how it works or would it throw a grokparse error??

Upvotes: 0

Views: 1763

Answers (1)

baudsp
baudsp

Reputation: 4100

The grok filter work with the patterns in the match block. It works as a regex (see here for the definition). Each pattern is composed of two parts: %{SYNTAX:SEMANTIC}.
If the regex created from the patterns match the whole line, the value from the SYNTAX will added as field with name SEMANTIC.
cf the documentation for more information.

You can have more than one grok pattern in your filter :

grok {
    match => {
        "message" => [
            "%{TIME}--%{DATA:source}--%{DATA:destination}--%{DATA:somedata}",
            "%{TIME:timestamp}--%{GREEDYDATA:somedata}",
            "%{GREEDYDATA:somedata}"
        ]
    }
}

Also, from Chro's comment: by default the Grok filter will attempt to match the patterns in the order they are supplied. So if you put that 3rd one (the GREEDYDATA one) first it will simply match that then leave the filter. You can make it match multiple patterns with the break_on_match setting, by putting it to false (by default it's true).


With your update:

In your case, if you have those lines:

timestamp source destination a=0,b=1,c=3,d=4
timestamp a=1,e=5, b=1
g=0

and you wish to extract the b and g values and nothing else, you'll have to use more than one pattern, one to grab the b value, the other for the `g value:

match => message [
    "b=%{NUMBER:b}",
    "g=%{NUMBER:g}"
]

Logstash process the logs line by line, and the output will be the result of the process done on that line. The grok filter attempt to parse lines with the pattern and add field if the parsing is successful. It does not capture the lines.

Upvotes: 1

Related Questions