Get repeating groups with regular expression

Question

I have string like that:

base | text1: 0.01 | text2: 0.02 | text3: 0.03

And I need to extract first word and all other text-number pairs. So this result I expect:

("base", "text1", "0.01", "text2", "0.02", "text3", "0.03")

I trying this regexp:

r"^(\w+)(?:\s+\|\s+)(?:([\w\s]*)\:\s([0-9.]+)(?:\s+\|\s+)?)+$"

But it captures only the last text-numberr pair:

("base", "text3", "0.03")

Here the full code I use:

import re

sr = "base | text1: 0.01 | text2: 0.02 | text3: 100.1"

pattern = r"^(\w+)(?:\s+\|\s+)(?:([\w\s]*)\:\s([0-9.]+)(?:\s+\|\s+)?)+$"

result = re.findall(pattern, sr)

print(result.groups())

Thank you!

The fourth bird · Accepted Answer

One option to get the desired result is to split on either a space pipe space or colon space.

(?: \| |: )

Regex demo

Example code

import re
 
s="base | text1: 0.01 | text2: 0.02 | text3: 0.03"
print(re.split(r"(?: \| |: )", s))

Output

['base', 'text1', '0.01', 'text2', '0.02', 'text3', '0.03']

Another option could be using the PyPi regex module and make use of the \G anchor and capturing groups, where the first word is in group 1, and the pairs are in group 2 and 3.

(?:^(\w+)|\G(?!^))\s+\|\s+(\w+):\s+(\d+\.\d+)

Regex demo

Get repeating groups with regular expression

Answers (2)

Related Questions