overedge
overedge

Reputation: 13

Regex for split or findall each digit python

What is the best solution to split this str var into a continuous number list

My solution :

>>> str
> '2223334441214844'
>>> filter(None, re.split("(0+)|(1+)|(2+)|(3+)|(4+)|(5+)|(6+)|(7+)|(8+)|(9+)", str))
> ['222', '333', '444', '1', '2', '1', '4', '8', '44']

Upvotes: 0

Views: 399

Answers (5)

Aaditya Ura
Aaditya Ura

Reputation: 12689

What about without importing any external module ?

You can create your own logic in pure python without importing any module Here is recursive approach,

string_1='2223334441214844'

list_2=[i for i in string_1]


def con(list_1):
    group = []
    if not list_1:
        return 0
    else:
        track=list_1[0]
        for j,i in enumerate(list_1):
            if i==track[0]:
                group.append(i)
            else:
                print(group)
                return con(list_1[j:])

        return group



print(con(list_2))

output:

['2', '2', '2']
['3', '3', '3']
['4', '4', '4']
['1']
['2']
['1']
['4']
['8']
['4', '4']

Upvotes: 0

poke
poke

Reputation: 388443

The more flexible way would be to use itertools.groupby which is made to match consecutive groups in iterables:

>>> s = '2223334441214844'
>>> import itertools
>>> [''.join(group) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']

The key would be the single key that is being grouped on (in your case, the digit). And the group is an iterable of all the items in the group. Since the source iterable is a string, each item is a character, so in order to get back the fully combined group, we need to join the characters back together.

You could also repeat the key for the length of the group to get this output:

>>> [key * len(list(group)) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']

If you wanted to use regular expressions, you could make use of backreferences to find consecutive characters without having to specify them explicitly:

>>> re.findall('((.)\\2*)',  s)
[('222', '2'), ('333', '3'), ('444', '4'), ('1', '1'), ('2', '2'), ('1', '1'), ('4', '4'), ('8', '8'), ('44', '4')]

For finding consecutive characters in a string, this is essentially the same that groupby will do. You can then filter out the combined match to get the desired result:

>>> [x for x, *_ in re.findall('((.)\\2*)',  s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']

Upvotes: 4

Barmar
Barmar

Reputation: 782785

Use a capture group and backreference.

str = '2223334441214844'

import re
print([i[0] for i in re.findall(r'((\d)\2*)', str)])

\2 matches whatever the (\d) capture group matched. The list comprehension is needed because when the RE contains capture groups, findall returns a list of the capture groups, not the whole match. So we need an extra group to get the whole match, and then need to extract that group from the result.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

If you only need to extract consecutive identical digits, you may use a matching approach using r'(\d)\1*' regex:

import re
s='2223334441214844'
print([x.group() for x in re.finditer(r'(\d)\1*', s)])
# => ['222', '333', '444', '1', '2', '1', '4', '8', '44']

See the Python demo

Here,

  • (\d) - matches and captures into Group 1 any digit
  • \1* - a backreference to Group 1 matching the same value, 0+ repetitions.

This solution can be customized to match any specific consecutive chars (instead of \d, you may use \S - non-whitespace, \w - word, [a-fA-F] - a specific set, etc.). If you replace \d with . and use re.DOTALL modifier, it will work as the itertools solutions posted above.

Upvotes: 1

Chris_Rands
Chris_Rands

Reputation: 41248

One solution without regex (that is not specific to digits) would be to use itertools.groupby():

>>> from itertools import groupby
>>> s = '2223334441214844'
>>> [''.join(g) for _, g in groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']

Upvotes: 4

Related Questions