yasu.neko
yasu.neko

Reputation: 145

finding repeated characters using re.match

I'm trying to figure out why the following code prints "2" instead of "1"

import re

line = "9111222"
m = re.match( r'.*(\w)\1+', line)
print m.group(1)

I understand that re.match attempts to match at the beginning of the string, but I assumed it would then see the "111" and print "1"

Upvotes: 2

Views: 92

Answers (2)

Skycc
Skycc

Reputation: 3555

The * means greedy, add a ? behind to make it non greedy if you were interested on the 111

import re

line = "9111222"
#                 ^
m = re.match( r'.*?(\w)\1+', line)
print(m.group(1))
# '1'

Upvotes: 2

qxz
qxz

Reputation: 3864

The * quantifier in regular expressions is greedy, meaning that it will try to match as many as possible. In your string, 91112(2)2 will allow .* to match the most characters, so that's the match the engine selects, with the second-to-last 2 being captured.

See https://regex101.com/r/IkM5gX/2

Upvotes: 4

Related Questions