sretko
sretko

Reputation: 640

Python re - findall vs finditer

I have the following string:

'3 4 4 5 5 5 2 2'

I need to extract all the consecutive occurrences out of it like so:

'44 555 22'

To do this I'm using the below code. It works fine:

n = input().replace(' ', '')
result = re.finditer(r'(\d)\1+', n)
for match in result:
    print(match.group(0), end=' ')

My question is how I can modify my regex so I can use findall() instead? I tried by using this:

n = input().replace(' ', '')
result = re.findall(r'(\d)\1+', n)
print(result) 

It returns just this: ['4', '5', '2'].

What's the reason for this behavior? By looking at the regex it looks like the pattern is capturing group 1 only instead of group 0. I think I'm unable to call group() on findall. Is it any way I can change my pattern or something else I can do to get same result from findall? For example: ['44', '555', '22'].

Upvotes: 3

Views: 1901

Answers (3)

Alfe
Alfe

Reputation: 59516

findall() returns all parenthesized groups if there are any, otherwise the complete match. In your example, you could use grouping for the whole and the inner, then you'd need to specify repetition of the second group instead of the first and select the first group as the result:

[x for x,y in re.findall(r'((\d)\2+)', '33344555')]

returns:

('333', '44', '555')

But I personally would stick to finditer(). Why do you want to change it?

Btw, you do not need to prepare your input by stripping the spaces:

[x for x,y in re.findall(r'((\d)(?: \2)+)', '3 3 3 4 4 5 5 5')]

returns:

['3 3 3', '4 4', '5 5 5']

Upvotes: 3

internet_user
internet_user

Reputation: 3279

import re

result = re.findall(r"((\d)\2+)", "34455522")
print(result)  # -> [('44', '4'), ('555', '5')]
result = [elem[0] for elem in result]
print(result)  # -> ['44', '555']

Capture the whole string of digits, and take only that.

Upvotes: 1

fredtantini
fredtantini

Reputation: 16566

You could capture the \1 group too, and then use list comprehension to join them together:

>>> re.findall(r'(\d)(\1+)', n)
[('4', '4'), ('5', '55'), ('2', '2')]
>>> [''.join(i) for i in re.findall(r'(\d)(\1+)', n)]
['44', '555', '22']

Upvotes: 1

Related Questions