Reputation:
What exactly is re.findall('(?=(b))','bbbb')
doing? It returns ['b', 'b', 'b', 'b']
, but I expected ['b', 'b', 'b']
, since it should only return a 'b' if it sees another 'b' ahead?
Thanks!
Edit: It seems that re.findall('b(?=(b))','bbbb')
returns ['b', 'b', 'b']
like I would expect, but I am still confused as to what re.findall('(?=(b))','bbbb')
does.
Edit 2: Got it! Thank you for the responses.
Upvotes: 3
Views: 1730
Reputation: 163517
A positive lookahead (?=
asserts a position which is found 4 times because there are 4 positions where a b follows. In that assertion itself you capture a (b)
in a capturing group which will be returned by findall.
If you want to return three times a b
and you are not referring to the group anymore, you could match b
and add a lookahead that asserts what is on the right side is a b
print(re.findall('b(?=b)','bbbb'))
Upvotes: 1
Reputation: 371049
You have a zero-length match there, and you have a capturing group. When the regular expression for re.findall
has a capturing group, the resulting list will be what's been captured in those capturing groups (if anything).
Four positions are matched by your regex: the start of the string, before the first b
, before the second b
, and before the third b
. Here's a diagram, where |
represents the position matched (spaces added for illustration):
b b b b
| captures the next b, passes
b b b b
| captures the next b, passes
b b b b
| captures the next b, passes
b b b b
| captures the next b, passes
b b b b
| lookahead fails, match fails
If you didn't want a capturing group and only want to match the zero-length positions instead, use (?:
instead of (
for a non-capturing group:
(?=(?:b))
(though the resulting list will be composed of empty strings and won't be very useful)
Upvotes: 2
Reputation: 140276
The problem is that the capturing group is inside the lookahead.
To do what you want you have to capture the letter, then use a lookahead that doesn't capture:
re.findall('(b)(?=b)','bbbb')
result:
['b', 'b', 'b']
Upvotes: 2