Reputation: 23
I have a report of occurred errors generated from my program and now I want to create some kind of whitelist. Therefore I want to parse all errors which have the tag "@Whitelist"
So the report looks like this:
...
45)
Error: some description
Signal: lorem ipsum
@Whitelist
46)
Error: other description
Signal: lorem ipsum
File: some file
47)
Error: lorem ipsum
Project: X
@Whitelist
@Comment description why this is sent to the whitelist
...
Here I want to have the Error Nr 45 and 47, but not 46
Ok to sum this up: I am trying to have a regular expression to get everything inbetween (including) the tag "Error" (which can be "Warning" or "Message" too) up to "@Comment" (including the comment tag with the message) and only if @Whitelist is present.
There can be N lines inbetween @Whitelist and the Error indicator
Actually I can't come up with a good solution for this problem, some professional out there? Many thanks in advance
Edit: I just realized that it could be possible that the report can change over time, for example there could be a headline added above a group of errors. Meaning: Error 46 and 47 have the same type, so there would be the line "File Read Errors: " above the Error 46. Thats why I wanted to have some kind of solution where I get the Error based on the Tag "Error|Warning|Message" and "@Whitelist" I hope it is kind of clear what I mean with this
Upvotes: 0
Views: 87
Reputation: 8033
One of @op's requirement can be formulated as:
@Comment
, anything following that line should be discardedI found it very difficult to fulfill this requirement and ended up with three regexes for:
Error:
etc and @Whitelist
(and discard anything before Error:
etc)@Comment
as stated aboveimport re
splitter = re.compile(r'\n*(?=\d+\)\n)')
filter = re.compile(
r'^(Error|Warning|Message):.*@Whitelist.*', re.DOTALL | re.MULTILINE)
cleanup = re.compile(r'(^@Comment[^\n]*).*', re.DOTALL | re.MULTILINE)
for chunk in splitter.split(input_str):
m = filter.search(chunk)
if m:
output = cleanup.sub(r'\1', m.group(0))
print("Output begin")
print(output)
print("Output end\n")
Isaac (?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
import re
regex = re.compile(r'(Error|Warning|Message)[^)]*@Whitelist[^)]*(?=(@Comment|\n\n))')
for m in regex.finditer(input_str):
print(m.group(0))
Error: some description
Signal: lorem ipsum
@Whitelist
Error: lorem ipsum
Project: X
@Whitelist
The idea is, each matching chunk should begin with either of Error
, Warning
or Message
, contain @Whitelist
, and end with either of @Comment
or an empty line \n\n
(but the ending part is excluded by the (?=...)
feature.)
Note that [^)]*
are used not to match against multiple chunks at once (according to your examples, each chunk begins with a number followed by )
.)
Upvotes: 1
Reputation: 82949
How about this non-regex solution: Just split by double-newline (i.e. empty lines) and see whether that block contains "@Whitelist"
:
for error in errors.split("\n\n"):
if "\n@Whitelist" in error:
print(error)
Or, if there are no blank lines actually, try this:
for error in re.split("\n(?=Error|Warning|Message)", errors):
...
IMHO, the more complex your error log becomes, the less likely a single regex is going to help you. Instead, you could use one regex for splitting the error messages and another regex for checking them, but currently not even that seems to be needed.
Upvotes: 1