Reputation: 10820
I want to split a string like:
'aaabbccccabbb'
into
['aaa', 'bb', 'cccc', 'a', 'bbb']
What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.
Upvotes: 8
Views: 359
Reputation: 133554
>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
Upvotes: 1
Reputation: 9322
Here's the best way I could find using regex:
print [a for a,b in re.findall(r"((\w)\2*)", s)]
Upvotes: 2
Reputation: 110301
You can create an iterator - without trying to be smart just to keep it short and unreadable:
def yield_same(string):
it_str = iter(string)
result = it_str.next()
for next_chr in it_str:
if next_chr != result[0]:
yield result
result = ""
result += next_chr
yield result
..
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>>
edit ok, so there is itertools.groupby, which probably does something like this.
Upvotes: 3
Reputation: 95308
That is the use case for itertools.groupby
:)
>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
Upvotes: 26