Reputation: 329
I'm modifying a regular expression to extract a group of group matches, but this 'supergroup' does not return the composite matched string as expected.
The string to match is of the form:
/DIR/SOMESTRING-W0.12+345.raw.gz
and the regex I'm using:
/DIR/
(?P<super>
(?P<name>.*?)
(?=(?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$)
)
I'm getting the following results for the named groups:
modifier: '-W0.12+345'
super: 'SOMESTRING'
name: 'SOMESTRING'
extension: 'raw.gz'
while I was expecting
super: 'SOMESTRING-W0.12+345.raw.gz'
The grouping of subgroups has always worked for me, but not this time, and I cannot understand why.
Hope someone could give me some hint.
NOTE: The explanation of this regex can be found in (matching a specific substring with regular expressions using awk)
Upvotes: 1
Views: 853
Reputation: 336218
The group super
matches the same text that the group name
matches, because the lookahead assertion doesn't contribute any actual characters to the match (that's why they're also called "zero-width assertions").
To get the desired result, just remove the lookahead assertion:
/DIR/
(?P<super>
(?P<name>.*?)
(?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$
)
Upvotes: 2