Reputation: 63516
I want to match the following input. How would I match a group a certain number of times without using a multiline string? Something like (^(\d+) (.+)$){3}) (but that doesn't work).
sample_string = """Breakpoint 12 reached
90 good morning
91 this is cool
92 this is bananas
"""
pattern_for_continue = re.compile("""Breakpoint \s (\d+) \s reached \s (.+)$
^(\d+)\s+ (.+)\n
^(\d+)\s+ (.+)\n
^(\d+)\s+ (.+)\n
""", re.M|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
print matchobj.group(0)
Upvotes: 1
Views: 2433
Reputation: 49003
You need something more like this:
import re
sample_string = """Breakpoint 12 reached
90 hey this is a great line
91 this is cool too
92 this is bananas
"""
pattern_for_continue = re.compile(r"""
Breakpoint\s+(\d+)\s+reached\s+\n
(\d+) ([^\n]+?)\n
(\d+) ([^\n]+?)\n
(\d+) ([^\n]+?)\n
""", re.MULTILINE|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
for i in range(1, 8):
print i, matchobj.group(i)
print "Entire match:"
print matchobj.group(0)
1 12
2 90
3 hey this is a great line
4 91
5 this is cool too
6 92
7 this is bananas
Entire match:
0 Breakpoint 12 reached
90 hey this is a great line
91 this is cool too
92 this is bananas
re.VERBOSE makes explicit whitespace necessary in your regex. I partially fixed this by left-justifying your data in the multiline string. I think this is justified because you probably don't have this in real code; it's likely an artifact of editing in a multiline string.
you need to replace $
with \n
.
you need non-greedy matches
Upvotes: 1
Reputation: 1121446
There are a series of problems with your expression and sample:
Your use of VERBOSE makes all spaces not match, so your spaces around the digits on the first line are ignored too. Replace spaces with \s
or [ ]
(the latter only matches a literal space, the former matches newlines and tabs too).
Your input sample has whitespace before the digit on each line but your example pattern requires that the digits are at the start of the line. Either allow for that whitespace or fix your sample input.
The biggest problem is that a capturing group inside a repeating group (so (\d+)
inside of a larger group with {3}
at the end) only captures the last match. You'll get 92
and this is bananas
, not the previous two matched lines.
To overcome all that, you have to repeat that pattern for the three lines explicitly. You could use Python to implement that repetition:
linepattern = r'[ ]* (\d+) [ ]+ ([^\n]+)\n'
pattern_for_continue = re.compile(r"""
Breakpoint [ ]+ (\d+) [ ]+ reached [ ]+ ([^\n]*?)\n
{}
""".format(linepattern * 3), re.MULTILINE|re.VERBOSE)
Which, for your sample input, returns:
>>> pattern_for_continue.match(sample_string).groups()
('12', '', '90', 'hey this is a great line', '91', 'this is cool too', '92', 'this is bananas')
If you really do not want to match spaces before the digits on the 3 extra lines, you can remove the first [ ]*
pattern from linepattern
.
Upvotes: 4