Reputation: 640
I have the following string:
'3 4 4 5 5 5 2 2'
I need to extract all the consecutive occurrences out of it like so:
'44 555 22'
To do this I'm using the below code. It works fine:
n = input().replace(' ', '')
result = re.finditer(r'(\d)\1+', n)
for match in result:
print(match.group(0), end=' ')
My question is how I can modify my regex so I can use findall()
instead? I tried by using this:
n = input().replace(' ', '')
result = re.findall(r'(\d)\1+', n)
print(result)
It returns just this: ['4', '5', '2']
.
What's the reason for this behavior? By looking at the regex it looks like the pattern is capturing group 1 only instead of group 0. I think I'm unable to call group()
on findall
. Is it any way I can change my pattern or something else I can do to get same result from findall
? For example: ['44', '555', '22']
.
Upvotes: 3
Views: 1901
Reputation: 59516
findall()
returns all parenthesized groups if there are any, otherwise the complete match. In your example, you could use grouping for the whole and the inner, then you'd need to specify repetition of the second group instead of the first and select the first group as the result:
[x for x,y in re.findall(r'((\d)\2+)', '33344555')]
returns:
('333', '44', '555')
But I personally would stick to finditer()
. Why do you want to change it?
Btw, you do not need to prepare your input by stripping the spaces:
[x for x,y in re.findall(r'((\d)(?: \2)+)', '3 3 3 4 4 5 5 5')]
returns:
['3 3 3', '4 4', '5 5 5']
Upvotes: 3
Reputation: 3279
import re
result = re.findall(r"((\d)\2+)", "34455522")
print(result) # -> [('44', '4'), ('555', '5')]
result = [elem[0] for elem in result]
print(result) # -> ['44', '555']
Capture the whole string of digits, and take only that.
Upvotes: 1
Reputation: 16566
You could capture the \1
group too, and then use list comprehension to join them together:
>>> re.findall(r'(\d)(\1+)', n)
[('4', '4'), ('5', '55'), ('2', '2')]
>>> [''.join(i) for i in re.findall(r'(\d)(\1+)', n)]
['44', '555', '22']
Upvotes: 1