Tony
Tony

Reputation: 2076

Logstash: Nested grok searches? Parsing a field into multiple fields?

I have log entries that look like this...

2014-02-25 00:00:03,936 INFO  - something happened...bla bla bla
2014-02-25 00:00:03,952 INFO  - ***Request Completed*** [   78.002] mS [http://cloud.mydomain.local/schedule/search?param=45]
2014-02-25 00:00:04,233 INFO  - something else happened...bla bla bla

I have a grok filter that correctly parses the lines...

grok {
    match => [ "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- %{GREEDYDATA:body}" ]
}

I'd like to parse additional data out of 'body' if 'body' begins with "***Request Completed***". Namely the 'elaspsedms' and 'uri'. How can I do this?

Elsewhere it was suggested that I add another message entry to the grok filter like this...

grok {
    match => [ 
              "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- \*\*\*Request Completed\*\*\* \[%{SPACE}%{NUMBER:elaspedms}\] mS \[%{URI:uri}\]",
              "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- %{GREEDYDATA:body}"
             ]
}

...this works, but for timing lines, the value of 'body' does NOT get set. Ideally I'd like body to always contain the last part of the entry and iff, the entry is a timing line, perform additional parsing of elapsedms and uri.

Any ideas how I can do this?

Is there a means to parse fields? Such that I could attempt parse 'body' into elapsedms/uri, if that fails, continue. Or is there a means to nest field matches in the grok expression?

Thoughts?

Edit: Rather than making sure 'body' is always set, could I just create body from 'elaspedms' and 'uri' if 'elaspedms' is set?

Upvotes: 4

Views: 5808

Answers (3)

Patrick Hoeffel
Patrick Hoeffel

Reputation: 113

Here is a better way that is known to work in Logstash 1.5.3:

grok {
   match => [ 
          "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- %{GREEDYDATA:body}"
         ]
}

# if body is set (which should always be true, but it's good to check anyway)
if [body] {
    grok {
       break_on_match => true
       match => [ 
          "body", "\*\*\*Request Completed\*\*\* \[%{SPACE}%{NUMBER:elaspedms}\] mS \[%{URI:uri}\]"
         ]
    }
}

This way, every record will have a body field, but only the lines that contain "***Request Completed***" will have elapsedms and uri fields. You can continue this logic with sub-sub fields and sub-sub-sub fields as far down into the weeds as you like.

I also included the "break_on_match" syntax in case that is helpful. You can set it to either true or false.

The key is to use the body field (or whichever field you're parsing) as the match source rather than message.

Upvotes: 1

GGGforce
GGGforce

Reputation: 653

I believe you need to use the break_on_match option within grok and set it to false: http://logstash.net/docs/1.4.2/filters/grok#break_on_match

Upvotes: 0

Tony
Tony

Reputation: 2076

This works. Is there a better way?

grok {
   match => [ 
          "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- \*\*\*Request Completed\*\*\* \[%{SPACE}%{NUMBER:elaspedms}\] mS \[%{URI:uri}\]",
          "message", "%{TIMESTAMP_ISO8601:logdate} %{WORD:severity}%{SPACE}- %{GREEDYDATA:body}"
         ]
}

# if body is NOT set (timing line) make one
if ![body] {
    mutate { 
        add_field => [ "body", "***Request Completed*** [%{elapsedms}] mS [%{uri}]"] 
    }
}

Upvotes: 3

Related Questions