Reputation: 408
I'm trying to use regex to match sequences of one or more instances of the same characters in a string.
Example :
string = "55544355"
# The regex should retrieve sequences "555", "44", "3", "55"
Can I have a few tips?
Upvotes: 5
Views: 3372
Reputation: 10951
Probably not the best option here, but for the sake of variety, how about this logic:
>>> def f(s):
l = []
c = s[0]
for x in s:
if x in c:
c += x
continue
l.append(c)
c = x
l.append(c)
return l
>>> f('55544355')
['555', '44', '3', '55']
>>> f('123444555678999001')
['1', '2', '3', '444', '555', '6', '7', '8', '999', '00', '1']
Upvotes: 2
Reputation: 473803
You can use re.findall()
and the ((.)\2*)
regular expression:
>>> [item[0] for item in re.findall(r"((.)\2*)", string)]
['555', '44', '3', '55']
the key part is inside the outer capturing group - (.)\2*
. Here we capture a single character via (.)
then reference this character by the group number: \2
. The group number is 2 because we have an outer capturing group with number 1. *
means 0 or more times.
You could've also solved it with a single capturing group and re.finditer()
:
>>> [item.group(0) for item in re.finditer(r"(.)\1*", string)]
['555', '44', '3', '55']
Upvotes: 8
Reputation: 17263
You can do this easily without regex using itertools.groupby
:
>>> from itertools import groupby
>>> s = '55544355'
>>> [''.join(g) for _, g in groupby(s)]
['555', '44', '3', '55']
Upvotes: 7