Reputation: 300
My data:
stack: 123 overflow: 456 others: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18 end: 42
My regular expression:
^stack: (\d+) overflow: (\d+) others: ?(.+) end: (\d+)$
Which matches the groups as:
1: 123
2: 456
3: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18
4: 42
Good so far. On group 3 then run the following regular expression:
^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$
that does not work at all(why?), so I remove the ^
and $
and it matches. The match then looks like this:
1: 7 // <-- Works as expected.
2: 7
3: 15 // <-- Here I'd expected 2 groups matching: (13,14), (15,16)
4: 16 // <-- but I'm only getting the last group.
1: 8 // <-- This works and the remainder is as expected.
2: 8
3: 17
4: 18
I seem to be missing "13, 14" my inner group that matches one or more (?: - m: (\d+) t: (\d+))+
combinations.
Online test: http://gskinner.com/RegExr/?33urf, in case that gets butchered, my data there is: - st: 7 ov: 7 againothers: - m: 11 t: 12 - m: 13 t: 14 - m: 15 t: 16 - st: 8 ov: 8 againothers: - m: 17 t: 18
and the regex is: (?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+
.
I've read http://www.regular-expressions.info/captureall.html, and I think my problem is related to that? Any tips/pointers/help so I can match one or more m:t: combinations?
Upvotes: 3
Views: 4853
Reputation:
Most regex engines do not allow multiple captures from the same set of parentheses within a repeating group. If capturing parentheses match more than once, you get what matched last as the result.
The simplest work-around is to make a regex for only that sub-pattern and then get the results captured from each time it matches.
In other words, first get the relevant portion of the string and then use a regex like this on it:
/ - m: (\d+) t: (\d+)/
(Using whatever mechanism your language uses to match all).
Upvotes: 3
Reputation: 93086
Your groups get following numbers
^(?:- st: (\d+) ov: (\d+) againothers: ?(?: - m: (\d+) t: (\d+))+)+$
1 2 3 4
They are numbered by the opening brackets.
If this expression is now matched a second time, then the content from the capturing groups is overwritten.
You are repeating a capturing group.
As I know in .net it is possible to access all those matches, but in all other regex implementations the group content is overwritten.
Upvotes: 2