Reputation: 3560
I have this multiline log file:
INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1
DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2
that is multiline!
WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message
ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...
my.packkageName.MyException: exception!
at my.packkageName.Class4.process(Class4.java:11)
at ...
INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is another INFO message
I want a regex that matches every single message in the log, in this way:
group 1: INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1
group 2: DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2
that is multiline!
group 3: WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message
group 4: ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...
my.packkageName.MyException: exception!
at my.packkageName.Class4.process(Class4.java:11)
at ...
This regex is only for single line messages:
(?:ERROR|DEBUG|INFO|WARN).++
Upvotes: 1
Views: 2640
Reputation: 29677
To load the logfile into a string and use regex to find the messages is probably not the most efficient way to process big logfiles.
But if you're fine with regex and also want to get that last message then you could do something like this:
String logstr = "INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1\n"
+ "DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2 \n"
+ " that is multiline!\n"
+ "WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message\n"
+ "ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...\n"
+ "my.packkageName.MyException: exception!\n"
+ " at my.packkageName.Class4.process(Class4.java:11)\n"
+ " at ...\n"
+ "INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is another INFO message ";
final Pattern pattern = Pattern.compile("^([A-Z]{4,}).+?(?=(?:^[A-Z]{4}|\\z))", Pattern.DOTALL | Pattern.MULTILINE);
Matcher messages = pattern.matcher(logstr);
while (messages.find()) {
System.out.println("---"+ messages.group(1));
System.out.println(messages.group(0));
}
Because of the Pattern.DOTALL the .*
also matches the line terminators.
And with the Pattern.MULTILINE the ^
also matches after any line terminator except at the end of input.
The \z
marks the end of the input.
Upvotes: 0
Reputation: 3560
I have found the solution.
The regex to be used is the following:
/(?:DEBUG|INFO|ERROR|WARN)[\s\S]+?(?=DEBUG|INFO|WARN|ERROR)/gm
This match every "log message" that is contained between the words DEBUG, INFO, ERROR or WARN, in multiline way.
Upvotes: 2