npatel
npatel

Reputation: 1111

Parse multi line output with regular expression

I am trying to parse following multi line output with regex,

>>> a = """
... Feature 101
... Learning: Yes
... --------------
... Feature 102
... Learning: No
... """

What I get is only one value, shouldn't it return both the values as I have used re.MULTILINE|re.DOTALL?

>>> import re
>>> re.findall('.*Feature\s*(\d+).*Learning\s*:\s*(\w+).*', a, re.MULTILINE|re.DOTALL)
[('102', 'No')]

Appreciate the help!

Upvotes: 0

Views: 38

Answers (1)

janos
janos

Reputation: 124648

The problem is the greedy .* (all 3 of them in the regex). If you make them all non-greedy by appending a ? (change them to .*?), you'll get all the results you expected:

>>> re.findall(r'.*?Feature\s*(\d+).*?Learning\s*:\s*(\w+).*?', a, re.MULTILINE|re.DOTALL)
[('101', 'Yes'), ('102', 'No')]

Also, it's always good to use raw strings with r'...' for regular expressions.

Upvotes: 2

Related Questions