ghostmansd
ghostmansd

Reputation: 3455

Python: regex: find if exists, else ignore

I need help with re module. I have pattern:

pattern = re.compile('''first_condition\((.*)\)
extra_condition\((.*)\)
testing\((.*)\)
other\((.*)\)''', re.UNICODE)

That's what happens if I run regex on the following text:

text = '''first_condition(enabled)
extra_condition(disabled)
testing(example)
other(something)'''
result = pattern.findall(text)
print(result)
[('enabled', 'disabled', 'example', 'something')]

But if one or two lines were missed, regex returns empty list. E.g. my text is:

text = '''first_condition(enabled)
other(other)'''

What I want to get:

[('enabled', '', '', 'something')]

I could do it in several commands, but I think that it will be slower than doing it in one regex. Original code uses sed, so it is very fast. I could do it using sed, but I need cross-platform way to do it. Is it possible to do? Tnanks!

P.S. It will be also great if sequence of strings will be free, not fixed:

text = '''other(other)
first_condition(enabled)'''

must return absolutely the same:

[('enabled', '', '', 'something')]

Upvotes: 2

Views: 4821

Answers (2)

steveha
steveha

Reputation: 76695

Use a non-matching group for optional stuff, and make the group optional by putting a question mark after the group.

Example:

pat = re.compile(r'a\(([^)]+)\)(?:b\((?P<bgr>[^)]+)\)?')

Sorry but I can't test this right now.

The above requires a string like a(foo) and grabs the text in parents as group 0.

Then it optionally matches a string like b(foo)and if it is matched it will be saved as a named group with name: bgr

Note that I didn't use .* to match inside the parens but [^)]+. This definitely stops matching when it reaches the closing paren, and requires at least one character. You could use [^)]* if the parens can be empty.

These patterns are getting complicated so you might want to use verbose patterns with comments.

To have several optional patterns that might appear in any order, put them all inside a non-matching group and separate them with vertical bars. You will need to use named match groups because you won't know the order. Put an asterisk after the non-matching group to allow for any number of the alternative patterns to be present (including zero if none are present).

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838106

I would parse it to a dictionary first:

import re

keys = ['first_condition', 'extra_condition', 'testing', 'other'] 
d = dict(re.findall(r'^(.*)\((.*)\)$', text, re.M))
result = [d.get(key, '') for key in keys]

See it working online: ideone

Upvotes: 4

Related Questions