Reputation: 3455
I need help with re module. I have pattern:
pattern = re.compile('''first_condition\((.*)\)
extra_condition\((.*)\)
testing\((.*)\)
other\((.*)\)''', re.UNICODE)
That's what happens if I run regex on the following text:
text = '''first_condition(enabled)
extra_condition(disabled)
testing(example)
other(something)'''
result = pattern.findall(text)
print(result)
[('enabled', 'disabled', 'example', 'something')]
But if one or two lines were missed, regex returns empty list. E.g. my text is:
text = '''first_condition(enabled)
other(other)'''
What I want to get:
[('enabled', '', '', 'something')]
I could do it in several commands, but I think that it will be slower than doing it in one regex. Original code uses sed, so it is very fast. I could do it using sed, but I need cross-platform way to do it. Is it possible to do? Tnanks!
P.S. It will be also great if sequence of strings will be free, not fixed:
text = '''other(other)
first_condition(enabled)'''
must return absolutely the same:
[('enabled', '', '', 'something')]
Upvotes: 2
Views: 4821
Reputation: 76695
Use a non-matching group for optional stuff, and make the group optional by putting a question mark after the group.
Example:
pat = re.compile(r'a\(([^)]+)\)(?:b\((?P<bgr>[^)]+)\)?')
Sorry but I can't test this right now.
The above requires a string like a(foo)
and grabs the text in parents as group 0.
Then it optionally matches a string like b(foo)
and if it is matched it will be saved as a named group with name: bgr
Note that I didn't use .*
to match inside the parens but [^)]+
. This definitely stops matching when it reaches the closing paren, and requires at least one character. You could use [^)]*
if the parens can be empty.
These patterns are getting complicated so you might want to use verbose patterns with comments.
To have several optional patterns that might appear in any order, put them all inside a non-matching group and separate them with vertical bars. You will need to use named match groups because you won't know the order. Put an asterisk after the non-matching group to allow for any number of the alternative patterns to be present (including zero if none are present).
Upvotes: 0
Reputation: 838106
I would parse it to a dictionary first:
import re
keys = ['first_condition', 'extra_condition', 'testing', 'other']
d = dict(re.findall(r'^(.*)\((.*)\)$', text, re.M))
result = [d.get(key, '') for key in keys]
See it working online: ideone
Upvotes: 4