u936293
u936293

Reputation: 16244

Explain the behavior of this re

I have the following:

>>> re.sub('(..)+?/story','\\g<1>','money/story')
'mey'
>>>

Why is capture group 1 the first letter and last two letters of money and not the first two letters?

Upvotes: 1

Views: 31

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

Because the string money contains 5 letters (odd) not even, it won't even match the first letter m. (..)+? captures two characters and non-greedily repeats the pattern one or more times . Because the repetation quantifier + exists next to the capturing group, it would capture tha last two characters of the match . Now the captured group contains the last two characters of the match done by this (..)+? pattern. So you got ey as the captured string not the first on. So by replacing all the matched characters with the string inside the group index 1 ey will give you mey.

DEMO

Upvotes: 1

Jerry
Jerry

Reputation: 71538

The first capture group does not contain m at all. What is being matched by (..)+?/story is oney/story.

The (..)+? matches an even number of characters, so the following is matched (spaced out to make it clearer):

m o n e y / s t o r y
  ^-^ ^-^

Then the replacement is the first capture group. Something you might not know is that when you have a repeated capture group (in this case (..)+?), then only the last captured group is kept.

To summarise, oney/story is matched, and replaced with ey, so the result is mey.

Upvotes: 1

Related Questions