Reputation: 91
For analyzing a log file, I need to extract exception types with python and regex.
The exception types always contain the substring "Exception".
The problem is that the substring "Exception" is not always at the end of their names.
Moreover, the exception types consist of an unknown number of dots.
Expected behaviour:
Input
"08-01-2021: There is a System.InvalidCalculationException - System reboots"
"09-01-2021: SuperSystem recognised a System.IO.WritingException ask user what to do next"
"10-01-2021: Oh no, not again an InternalException.NullReference.NonCritical.User we should fix it!"
Output
"System.InvalidCalculationException"
"System.IO.WritingException"
"InternalException.NullReference.NonCritical.User"
How does the regex need to look like?
I have tried it with "\w+[.]\w+[.]*Exception" for the exception types who are ending with "Exception".
But what if exception types contain even more dots and "Exception" is not at the end?
Upvotes: 3
Views: 1021
Reputation: 11
Based on what you wrote, it can be said that every exception is a string of letters and dots.
I think this can solve your problem : "([A-Z][a-z]*.).([^\s]+)"
check it in link
Upvotes: 0
Reputation: 2862
How about:
[^\s]*Exception[^\s]*
(Demo)
The above ensures that your string contains the word "Exception" and includes anything before or after that is not a white space character.
[^\s]*
Matches anything that is not (^
) a white space (\s
) 0 to unlimited times (*
).
Upvotes: 1
Reputation: 626845
You can use
\b(?:[A-Za-z]+\.)*[A-Za-z]*Exception(?:\.[A-Za-z]+)*\b
\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b
See the regex demo / regex demo #2. Details:
\b
- a word boundary(?:[A-Za-z]+\.)*
- zero or more occurrences of one or more letters followed with a dot[A-Za-z]*
- zero or more lettersException
- a string Exception
(?:\.[A-Za-z]+)*
- zero or more reptitions of a dot and then one or more letters.\b
- a word boundary.The \w
matches any letters, digits or underscore.
Python usage:
re.findall(r'\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b', text)
Upvotes: 1