Reputation: 2285
I am trying to match "any consecutive chains of SAME character that is NOT .(period) "
Lets say I have
line = '....xooo......'
If I do this,
match in re.findall(r'[^\.]{2,}', line)
match returns "xooo".
Instead, I only want "ooo," which is a sequence of SAME character..
How do I do this?
Upvotes: 3
Views: 2553
Reputation: 103744
re.search(r'(([^.])\2{1,})', line).group(1)
Explanation:
"(([^.])\2{1,})"
1st Capturing group (([^.])\2{1,})
2nd Capturing group ([^.])
Negated char class [^.] matches any character except:
. The character .
\2 1 to infinite times [greedy] Matches text saved in the 2nd capturing group
If you want all the matches of that constraint:
>>> line = '....xooo...xx..yyyyy.'
>>> map(lambda t: t[0], re.findall(r"(([^.])\2+)", line))
# ['ooo', 'xx', 'yyyyy']
Upvotes: 2
Reputation: 239443
line = '....xooo......aaaa...'
import re
print [whole for whole, _ in re.findall("(([^.])\\2+)", line)]
Output
['ooo', 'aaaa']
([^.])
matches anything but .
and it is captured as a group. \\2
refers to the inner captured group which is the character matched by ([^.])
and +
means atleast once. So, it matches ooo
Upvotes: 1