regex: match exactly three lines

Question

I want to match the following input. How would I match a group a certain number of times without using a multiline string? Something like (^(\d+) (.+)$){3}) (but that doesn't work).

sample_string = """Breakpoint 12 reached 
         90  good morning
     91  this is cool
     92  this is bananas
     """
pattern_for_continue = re.compile("""Breakpoint \s (\d+) \s reached \s (.+)$
                                 ^(\d+)\s+  (.+)

                                 ^(\d+)\s+  (.+)

                                 ^(\d+)\s+  (.+)

                                  """, re.M|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
    print matchobj.group(0)

Martijn Pieters · Accepted Answer

There are a series of problems with your expression and sample:

Your use of VERBOSE makes all spaces not match, so your spaces around the digits on the first line are ignored too. Replace spaces with \s or [ ] (the latter only matches a literal space, the former matches newlines and tabs too).
Your input sample has whitespace before the digit on each line but your example pattern requires that the digits are at the start of the line. Either allow for that whitespace or fix your sample input.
The biggest problem is that a capturing group inside a repeating group (so (\d+) inside of a larger group with {3} at the end) only captures the last match. You'll get 92 and this is bananas, not the previous two matched lines.

To overcome all that, you have to repeat that pattern for the three lines explicitly. You could use Python to implement that repetition:

linepattern =  r'[ ]* (\d+) [ ]+ ([^
]+)
'

pattern_for_continue = re.compile(r"""
    Breakpoint [ ]+ (\d+) [ ]+ reached [ ]+ ([^
]*?)

    {}
""".format(linepattern * 3), re.MULTILINE|re.VERBOSE)

Which, for your sample input, returns:

>>> pattern_for_continue.match(sample_string).groups()
('12', '', '90', 'hey this is a great line', '91', 'this is cool too', '92', 'this is bananas')

If you really do not want to match spaces before the digits on the 3 extra lines, you can remove the first [ ]* pattern from linepattern.

regex: match exactly three lines

Answers (2)

Code

Result

Reasons

Related Questions