Eduardo Almeida
Eduardo Almeida

Reputation: 408

Python - Find sequence of same characters

I'm trying to use regex to match sequences of one or more instances of the same characters in a string.

Example :

string = "55544355"
# The regex should retrieve sequences "555", "44", "3", "55"

Can I have a few tips?

Upvotes: 5

Views: 3372

Answers (3)

Iron Fist
Iron Fist

Reputation: 10951

Probably not the best option here, but for the sake of variety, how about this logic:

>>> def f(s):
        l = []
        c = s[0]
        for x in s:
            if x in c:
                c += x
                continue
            l.append(c)
            c = x
        l.append(c)
        return l

>>> f('55544355')
['555', '44', '3', '55']
>>> f('123444555678999001')
['1', '2', '3', '444', '555', '6', '7', '8', '999', '00', '1']

Upvotes: 2

alecxe
alecxe

Reputation: 473803

You can use re.findall() and the ((.)\2*) regular expression:

>>> [item[0] for item in re.findall(r"((.)\2*)", string)]
['555', '44', '3', '55']

the key part is inside the outer capturing group - (.)\2*. Here we capture a single character via (.) then reference this character by the group number: \2. The group number is 2 because we have an outer capturing group with number 1. * means 0 or more times.

You could've also solved it with a single capturing group and re.finditer():

>>> [item.group(0) for item in re.finditer(r"(.)\1*", string)]
['555', '44', '3', '55']

Upvotes: 8

niemmi
niemmi

Reputation: 17263

You can do this easily without regex using itertools.groupby:

>>> from itertools import groupby
>>> s = '55544355'
>>> [''.join(g) for _, g in groupby(s)]
['555', '44', '3', '55']

Upvotes: 7

Related Questions