Reputation: 571
I have following data:
2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters) {
'x':1,
'y':2,
'z':3,
'w':4
}
2018-03-20 23:28:47 INFO This is an info sample(can be multiline with new line characters)
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters){
'a':5,
'b':6,
'c':7,
'd':8
}
I've to extract all DEBUG statements and for that I am using this regex (\d{4}\-\d{2}\-\d{2}\ \d{2}\:\d{2}\:\d{2}\ DEBUG(.|\n|\r)*?)(?=\d{4}\-\d{2}\-\d{2}\ \d{2}\:\d{2}\:\d{2})
but it is omitting the last DEBUG statement. What should be the regex to obtain following output?
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters) {
'x':1,
'y':2,
'z':3,
'w':4
}
2018-03-20 23:28:47 DEBUG This is a debug sample(can be multiline with new line characters){
'a':5,
'b':6,
'c':7,
'd':8
}
Upvotes: 2
Views: 609
Reputation: 2945
If you are sure that all the paragraphs with DEBUG
will end with }
, you can use:
r"(.*DEBUG[\s\S]*?\})"
If DEBUG
may or may not have {}
, the following regex should do the trick:
r"(.*DEBUG.*(?!=\{|\n))(\{[\s\S]*?\})?"
Upvotes: 1
Reputation: 626689
I suggest:
(?m)
)\Z
(same as Ken suggests in the comments)(.|\r|\n)*?
pattern with .*?
and adding a DOTALL modifier (?s)
The whole fix will look like
(?sm)^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} DEBUG\s*(.*?)(?=[\r\n]+\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}|\Z)
See the regex demo.
Details
(?sm)
- DOTALL and MULTILINE options on^
- start of a line\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}
- a timestamp like patternDEBUG
- a literal substring\s*
- 0+ whitespaces(.*?)
- Group 1: any 0+ chars, as few as possible, up to but excluding(?=[\r\n]+\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}|\Z)
- a positive lookahead that requires either
[\r\n]+\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}
- one or more CR or LF symbol(s) followed with a timestamp like pattern|
- or\Z
- the very end of the stringUpvotes: 3