df'
df'

Reputation: 300

Regular expression: repeating groups only getting last group

My data:

stack: 123 overflow: 456 others: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18 end: 42

My regular expression:

^stack: (\d+) overflow: (\d+) others: ?(.+) end: (\d+)$

Which matches the groups as:

1: 123
2: 456
3: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18
4: 42

Good so far. On group 3 then run the following regular expression:

^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$

that does not work at all(why?), so I remove the ^ and $ and it matches. The match then looks like this:

1: 7     // <-- Works as expected.
2: 7
3: 15    // <-- Here I'd expected 2 groups matching: (13,14), (15,16)
4: 16    // <-- but I'm only getting the last group.
1: 8     // <-- This works and the remainder is as expected.
2: 8
3: 17
4: 18

I seem to be missing "13, 14" my inner group that matches one or more (?: - m: (\d+) t: (\d+))+ combinations.

Online test: http://gskinner.com/RegExr/?33urf, in case that gets butchered, my data there is: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18 and the regex is: (?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+.

I've read http://www.regular-expressions.info/captureall.html, and I think my problem is related to that? Any tips/pointers/help so I can match one or more m:t: combinations?

Upvotes: 3

Views: 4853

Answers (2)

user1919238
user1919238

Reputation:

Most regex engines do not allow multiple captures from the same set of parentheses within a repeating group. If capturing parentheses match more than once, you get what matched last as the result.

The simplest work-around is to make a regex for only that sub-pattern and then get the results captured from each time it matches.

In other words, first get the relevant portion of the string and then use a regex like this on it:

/ - m: (\d+) t: (\d+)/

(Using whatever mechanism your language uses to match all).

Upvotes: 3

stema
stema

Reputation: 93086

Your groups get following numbers

^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$
          1         2                            3        4

They are numbered by the opening brackets.

If this expression is now matched a second time, then the content from the capturing groups is overwritten.

You are repeating a capturing group.

As I know in .net it is possible to access all those matches, but in all other regex implementations the group content is overwritten.

Upvotes: 2

Related Questions