VijayKarthikeyan
VijayKarthikeyan

Reputation: 27

Hard to stash a log file with different occurrence of order for a field using Logstash

I am trying to stash a log file to elasticsearch using Logstash. I am facing a problem while doing this.

If the log file has same kind of log lines like the below,

[12/Sep/2016:18:23:07] VendorID=5037 Code=C AcctID=5317605039838520 [12/Sep/2016:18:23:22] VendorID=9108 Code=A AcctID=2194850084423218 [12/Sep/2016:18:23:49] VendorID=1285 Code=F AcctID=8560077531775179 [12/Sep/2016:18:23:59] VendorID=1153 Code=D AcctID=4433276107716482

where the date, vendorId, code and acctID's order of occurrence of fields does not change or a new element is not added in to it, then the filter(given below) in the config files work well.

\[%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME}\] VendorID=%{INT:VendorID} Code=%{WORD:Code} AcctID=%{INT:AcctID}

Suppose the order changes like the example given below or if a new element is added to one of the log lines, then the grokparsefailure occurs.

[12/Sep/2016:18:23:07] VendorID=5037 Code=C AcctID=5317605039838520
[12/Sep/2016:18:23:22] VendorID=9108 Code=A AcctID=2194850084423218 [12/Sep/2016:18:23:49] VendorID=1285 Code=F AcctID=8560077531775179 [12/Sep/2016:18:23:59] VendorID=1153 Code=D AcctID=4433276107716482 [12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L [12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L [12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L

Here in the example, the last three log lines are different from the first four log lines in order of occurrence of the fields. And because of this, the filter message with the grok pattern could not parse the below three lines as it is written for the first four lines.

How should I handle this scenario, when i come across this case? Please help me solve this problem. Also provide any link to any document for detailed explanation with examples.

Thank you very much in advance.

Upvotes: 1

Views: 236

Answers (1)

pandaadb
pandaadb

Reputation: 6456

As correctly pointed out by baudsp, this can be achieved by multiple grok filters. The KV filter seems like a nicer option, but as for grok, this is one solution:

input {
  stdin {}
}

filter {

    grok {
         match => {
                "message" => ".*test1=%{INT:test1}.*"
         }
    }

    grok {
         match => {
                "message" => ".*test2=%{INT:test2}.*"
         }
    }


}
output {
    stdout { codec => rubydebug }
}

By having 2 different grok filter applied, we can disregard the order of the logs coming in. The patterns specified basically do not care about what comes before or after the String test and rather just standalone match their respective patterns.

So, for these 2 strings:

test1=12 test2=23
test2=23 test1=12

You will get the correct output. Test:

artur@pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf_grok_ordering/
Settings: Default pipeline workers: 8
Pipeline main started
test1=12 test2=23
{
       "message" => "test1=12 test2=23",
      "@version" => "1",
    "@timestamp" => "2016-12-21T16:48:24.175Z",
          "host" => "pandaadb",
         "test1" => "12",
         "test2" => "23"
}
test2=23 test1=12
{
       "message" => "test2=23 test1=12",
      "@version" => "1",
    "@timestamp" => "2016-12-21T16:48:29.567Z",
          "host" => "pandaadb",
         "test1" => "12",
         "test2" => "23"
}

Hope that helps

Upvotes: 0

Related Questions