user84592
user84592

Reputation: 4882

Ignore and move to next pattern if log contains a specific word

I have a log file which comes from spring log file. The log file has three formats. Each of the first two formats is a single line, between them if there is keyword app-info, it is the message printed by own developer. If no, it is printed by spring framework. We may treat developers message different from spring framework ones. The third format is a multiline stack trace.

We have an example for our own format, for example

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

The above line has app-info key works, so it is our own developers'.

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

The above line has not app-info keyword, so it is printed by spring framework.

In my Grok filter, The first pattern is for messages printed from spring framework, the second is for developers' message, the third format is for multiline stacktrace. I want to first regex clearly mention that spring framework pattern does not have key word app-info so that it could get paserexception and follow the second pattern which is developers own format. So I have following formats in regex tool, but I got compile error. My regex is as follows:

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^((?app-info).)*\s\.\w\-\'\:\d\[\]\/]+)

since in Grok filter, I use instruction from this link

filter {
   grok {
     match => [ "message", "PATTERN1", "PATTERN2" , "PATTERN3" ]
    }
}

My current configure in logstash is as follows which does not mention app-info clearly in the pattern:

filter {
  grok {
    match => [
      "message",
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[\s\.\w\-\'\:\d\[\]\/^[app-info]]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s(?<appinfo>app-info)\s-\s(?<systemmsg>[\w\d\:\{\}\,\-\(\)\s\"]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\w\-\d]+)\]\s-\s(?<loglevel>[\w]+)\s\-\s(?<appinfo>app-info)\s-\s(?<params>params):(?<jsonstr>[\"\w\d\,\:\.\{\}]+)\s(?<exceptionname>[\w\d\.]+Exception):\s(?<exceptiondetail>[\w\d\.]+)\n\t(?<extralines>at[\s\w\.\d\~\?\n\t\(\)\_\[\]\/\:\-]+)\n\d'
      
    ]      
  }

}

With the format in above logstash configuration, when handling with

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

The first pattern(spring framework pattern) already works, so it does not fall into second pattern which is our own developers format. The parser has parsered successfully as follows:

  {
  "timestamp": [
    [
      "2018-04-27 10:42:49"
    ]
  ],
  "threadname": [
    [
      "http-nio-8088-exec-1"
    ]
  ],
  "loglevel": [
    [
      "INFO"
    ]
  ],
  "systemmsg": [
    [
      "app-info - injectip ip 192.168.16.89\n\n"
    ]
  ]
}

Any hints I could let first pattern clearly mention that systemmsg shall not contain key word "app-info"?

EDIT:

My goal is that if there is no key word app-info, I let pattern 1 to handle the log. If there is key word app-info, I let pattern 2 to handle the log.

With following log which does not contains key word app-info (pattern 1 shall works),

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

I got following result no match with first pattern modified following your suggestion, which is not my goal.

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^(?:(?!app\-info).)*\s\.\w\-\'\:\d\[\]\/]+)

see demo. My goal is to extract timestamp, thread name, log level and system msg. But first pattern does not give me the expected result. The tool say there is no match.

if I remove ^(?:(?!app-info).)*, then above log(without key word app-info) parser works. See demo But now, It also works for log which contains key word app-info which is not expected, since now I want to extract timestamp, threadname, loglevel,app-info(exist or not)(the field shall be extracted or grouped), then systemmsg. The expectation is that the first parser returns error, let second parser to handle the log. demo could see the parser also works for log with key word app-info. Systemmsg put field app-info into its value which is not expected.

So I want pattern 1, handles log without keyword app-info, pattern 2 handles log with keyword app-info. So I clearly let pattern 1 throw parse error or exception when it contains key word app-info.

Upvotes: 2

Views: 3617

Answers (2)

xs2rashid
xs2rashid

Reputation: 1053

I used GREEDYDATA for this, suppose you have following log line

Redirect Controller: successful redirection for click data: {a:123, b:345}

and you want to capture until "data" then use GREEDYDATA as following

%{GREEDYDATA}data:%{SPACE}%{rest of the pattern}

Upvotes: 0

Sufiyan Ghori
Sufiyan Ghori

Reputation: 18743

My goal is let pattern 1 handles log without keyword app-info. If there is app-info, the first pattern shall throw parse error, so that the second parser could handle the log.

You can use the following as your first pattern,

(?<data>^(?!.*app-info).*)%{LOGLEVEL:log}%{DATA:other_data}%{IP:ip}$

What it will do is, it will ignore the log if there is app-info in it at any position, and move to the 2nd PATTERN.

EXAMPLE


Log without app-info,

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89

You can filter it as per your requirements.

OUTPUT

{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - "
    ]
  ],
  "log": [
    [
      "INFO"
    ]
  ],
  "other_data": [
    [
      "  injectip ip "
    ]
  ],
  "ip": [
    [
      "192.168.16.89"
    ]
  ]
}

Now log with app-info,

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO app-info  injectip ip 192.168.16.89

OUTPUT

No Matches

Please test it here

EDIT 2:

If you make PATTERN1 equals to (?<data>^(?!.*app-info).*)

you will get,

{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89"
    ]
  ]
}

you can then add a 2nd grok filter for the data field as follows,

grok {
  match => {"data" => "DEFINE PATTERN HERE"}
}

Upvotes: 1

Related Questions