Get repeating groups with regular expression

I have string like that:

base | text1: 0.01 | text2: 0.02 | text3: 0.03

And I need to extract first word and all other text-number pairs. So this result I expect:

("base", "text1", "0.01", "text2", "0.02", "text3", "0.03")

I trying this regexp:

r"^(\w+)(?:\s+\|\s+)(?:([\w\s]*)\:\s([0-9.]+)(?:\s+\|\s+)?)+$"

But it captures only the last text-numberr pair:

("base", "text3", "0.03")

Here the full code I use:

import re

sr = "base | text1: 0.01 | text2: 0.02 | text3: 100.1"

pattern = r"^(\w+)(?:\s+\|\s+)(?:([\w\s]*)\:\s([0-9.]+)(?:\s+\|\s+)?)+$"

result = re.findall(pattern, sr)

print(result.groups())

Thank you!

Upvotes: 1

Views: 122

Answers (2)

Alexander Mashin
Alexander Mashin

Reputation: 4564

I suggest something like this:

import re

sr = "base | text1: 0.01 | text2: 0.02 | text3: 100.1"

pattern1 = r"^(\w+)((?:\s+\|\s+[\w\s]+\s*:\s*\d+\.\d+)+)$"
bases = re.findall (pattern1, sr)

for base in bases:
    result = [base[0]]
    pattern2 = r"\|\s+([\w\s]+)\s*:\s*(\d+\.\d+)"
    texts = re.findall(pattern2, base[1])
    for text in texts:
        result.append(text[0])
        result.append(text[1])      
    print (result)

Note the simplified regular expressions.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163632

One option to get the desired result is to split on either a space pipe space or colon space.

(?: \| |: )

Regex demo

Example code

import re
 
s="base | text1: 0.01 | text2: 0.02 | text3: 0.03"
print(re.split(r"(?: \| |: )", s))

Output

['base', 'text1', '0.01', 'text2', '0.02', 'text3', '0.03']

Another option could be using the PyPi regex module and make use of the \G anchor and capturing groups, where the first word is in group 1, and the pairs are in group 2 and 3.

(?:^(\w+)|\G(?!^))\s+\|\s+(\w+):\s+(\d+\.\d+)

Regex demo

Upvotes: 2

Related Questions