user2492270
user2492270

Reputation: 2285

Regex: How to match sequence of SAME characters?

I am trying to match "any consecutive chains of SAME character that is NOT .(period) "

Lets say I have

line = '....xooo......'

If I do this,

match in re.findall(r'[^\.]{2,}', line)

match returns "xooo".

Instead, I only want "ooo," which is a sequence of SAME character..

How do I do this?

Upvotes: 3

Views: 2553

Answers (2)

dawg
dawg

Reputation: 103744

re.search(r'(([^.])\2{1,})', line).group(1)

Explanation:

"(([^.])\2{1,})"
    1st Capturing group (([^.])\2{1,})
    2nd Capturing group ([^.])
      Negated char class [^.] matches any character except:
         . The character .
    \2 1 to infinite times [greedy] Matches text saved in the 2nd capturing group

If you want all the matches of that constraint:

>>> line = '....xooo...xx..yyyyy.'
>>> map(lambda t: t[0], re.findall(r"(([^.])\2+)", line))
# ['ooo', 'xx', 'yyyyy']

Upvotes: 2

thefourtheye
thefourtheye

Reputation: 239443

line = '....xooo......aaaa...'
import re
print [whole for whole, _ in re.findall("(([^.])\\2+)", line)]

Output

['ooo', 'aaaa']

([^.]) matches anything but . and it is captured as a group. \\2 refers to the inner captured group which is the character matched by ([^.]) and + means atleast once. So, it matches ooo

Upvotes: 1

Related Questions