RogerFC
RogerFC

Reputation: 329

Regular expression grouping with lookaheads (in Python)

I'm modifying a regular expression to extract a group of group matches, but this 'supergroup' does not return the composite matched string as expected.

The string to match is of the form:

/DIR/SOMESTRING-W0.12+345.raw.gz

and the regex I'm using:

/DIR/
(?P<super>
    (?P<name>.*?)
    (?=(?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$)
)

I'm getting the following results for the named groups:

modifier: '-W0.12+345'
super: 'SOMESTRING'
name: 'SOMESTRING'
extension: 'raw.gz'

while I was expecting

super: 'SOMESTRING-W0.12+345.raw.gz'

The grouping of subgroups has always worked for me, but not this time, and I cannot understand why.

Hope someone could give me some hint.

NOTE: The explanation of this regex can be found in (matching a specific substring with regular expressions using awk)

Upvotes: 1

Views: 853

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336218

The group super matches the same text that the group name matches, because the lookahead assertion doesn't contribute any actual characters to the match (that's why they're also called "zero-width assertions").

To get the desired result, just remove the lookahead assertion:

/DIR/
(?P<super>
    (?P<name>.*?)
    (?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$
)

Upvotes: 2

Related Questions