Hery
Hery

Reputation: 7619

python string splitting

I have an input string like this: a1b2c30d40 and I want to tokenize the string to: a, 1, b, 2, c, 30, d, 40.

I know I can read each character one by one and keep track of the previous character to determine if I should tokenize it or not (2 digits in a row means don't tokenize it) but is there a more pythonic way of doing this?

Upvotes: 7

Views: 1029

Answers (1)

Cat Plus Plus
Cat Plus Plus

Reputation: 129754

>>> re.split(r'(\d+)', 'a1b2c30d40')
['a', '1', 'b', '2', 'c', '30', 'd', '40', '']

On the pattern: as the comment says, \d means "match one digit", + is a modifier that means "match one or more", so \d+ means "match as much digits as possible". This is put into a group (), so the entire pattern in context of re.split means "split this string using as much digits as possible as the separator, additionally capturing matched separators into the result". If you'd omit the group, you'd get ['a', 'b', 'c', 'd', ''].

Upvotes: 13

Related Questions