Fomalhaut
Fomalhaut

Reputation: 9745

How do I collect values into a list in Python standard regex?

I have a string with repeated parts:

s = '[1][2][5] and [3][8]'

And I want to group the numbers into two lists using re.match. The expected result is:

{'x': ['1', '2', '5'], 'y': ['3', '8']}

I tried this expression that gives a wrong result:

re.match(r'^(?:\[(?P<x>\d+)\])+ and (?:\[(?P<y>\d+)\])+$', s).groupdict()
# {'x': '5', 'y': '8'}

It looks like re.match keeps the last match only. How do I collect all the parts into a list instead of the last one only?

Of course, I know that I could split the line on ' and ' separator and use re.findall for the parts instead, but this approach is not general enough because it gives some issues for more complex strings so I would always need to think about correct splitting separately all the time.

Upvotes: 1

Views: 59

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

If you want to use the named capture groups, you can write the pattern like this repeating the digits between the square brackets inside the named group.

Then you can get the digits from the groupdict using re.findall on the values and first check if there is a match for the pattern:

^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$

See a regex demo

Example

import re

s = '[1][2][5] and [3][8]'
m = re.match(r'^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$', s)

if m:
    dct = {k: re.findall(r"\d+", v) for k, v in m.groupdict().items()}
    print(dct)

Output

{'x': ['1', '2', '5'], 'y': ['3', '8']}

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521259

We can use regular expressions here. First, iterate the input string looking for matches of the type [3][8]. For each match, use re.findall to generate a list of number strings. Then, add a key whose value is that list. Note that we maintain a list of keys and pop each one when we use it.

import re

s = '[1][2][5] and [3][8]'
keys= ['x', 'y']
d = {}
for m in re.finditer('(?:\[\d+\])+', s):
    d[keys.pop(0)] = re.findall(r'\d+', m.group())

print(d)  # {'y': ['3', '8'], 'x': ['1', '2', '5']}

Upvotes: 1

Related Questions