I have this example text snippet headline: Status[apphmi]: blubb, 'Statustext1' Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2' Popup[apphmi]: blaaa, 'Popuptext1' and I want to extract the words within '', but sorted with the context (status, main, popup). My current regex is (<a href="http://pythex.org/?regex=headline%3A(%3F%3A%5Cn%20%2BStatus%5C%5Bapphmi%5C%5D%3A.*%20%27(.*)%27)*(%3F%3A%5Cn%20%2BMain%5C%5Bapphmi%5C%5D%3A.*%20%27(.*)%27)*(%3F%3A%5Cn%20%2BPopup%5C%5Bapphmi%5C%5D%3A.*%20%27(.*)%27)*&test_string=headline%3A%0A%20%20%20%20Status%5Bapphmi%5D%3A%20blubb%2C%20%27Statustext%27%0A%20%20%20%20Main%5Bapphmi%5D%3A%20bla%2C%20%27Maintext1%27Main%5Bapphmi%5D%3A%20blaa%2C%20%27Maintext2%27%0A%20%20%20%20Popup%5Bapphmi%5D%3A%20blaaa%2C%20%27Popuptext%27%20%5Bpuff%5D%0A&ignorecase=0&multiline=0&dotall=0&verbose=0" rel="nofollow">example at pythex.org ): headline:(?:\n +Status\[apphmi\]:.* '(.*)')*(?:\n +Main\[apphmi\]:.* '(.*)')*(?:\n +Popup\[apphmi\]:.* '(.*)')* but with this I only get 'Maintext2' and not both. I don't know how to repeat the groups to an arbitrary number.

Reputation: 168

Repeated regex groups of arbitrary number

I have this example text snippet

headline:
        Status[apphmi]: blubb, 'Statustext1'
        Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'
        Popup[apphmi]: blaaa, 'Popuptext1'

and I want to extract the words within '', but sorted with the context (status, main, popup).

My current regex is (example at pythex.org):

headline:(?:\n +Status\[apphmi\]:.* '(.*)')*(?:\n +Main\[apphmi\]:.* '(.*)')*(?:\n +Popup\[apphmi\]:.* '(.*)')*

but with this I only get 'Maintext2' and not both. I don't know how to repeat the groups to an arbitrary number.

Upvotes: 0

Answers (1)

Mustofa Rizwan

Reputation: 10466

You can try with this:

r"(.*?]):(?:[^']*)'([^']*)'"g

Look here Group1 and Group 2 for each match contains your key value pair

You can not merge the second match as one by using regex, once you get all the pairs... you can apply some programming here to merge duplicate keys as one.

Here I have used dictionary of list, if a key already exists in the dictionary then you should append the value to the list , otherwise insert a new key with a new list having the value.

This is how it should be done (tested in python 3+)

import re

d = dict()
regex = r"(.*?]):(?:[^']*)'([^']*)'"

test_str = ("headline:        \n"
    "Status[apphmi]: blubb, 'Statustext1'\n"
    "Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'\n"
    "Popup[apphmi]: blaaa, 'Popuptext1'")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    if match.group(1) in d:
        d[match.group(1)].append(match.group(2))
    else:
        d[match.group(1)] = [match.group(2),]
print(d)

Output:

{
'Popup[apphmi]': ['Popuptext1'], 
'Main[apphmi]': ['Maintext1', 'Maintext2'], 
'Status[apphmi]': ['Statustext1']
}

Upvotes: 1

Repeated regex groups of arbitrary number

Answers (1)

Related Questions