Reputation: 168
I have this example text snippet
headline:
Status[apphmi]: blubb, 'Statustext1'
Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'
Popup[apphmi]: blaaa, 'Popuptext1'
and I want to extract the words within '', but sorted with the context (status, main, popup).
My current regex is (example at pythex.org):
headline:(?:\n +Status\[apphmi\]:.* '(.*)')*(?:\n +Main\[apphmi\]:.* '(.*)')*(?:\n +Popup\[apphmi\]:.* '(.*)')*
but with this I only get 'Maintext2' and not both. I don't know how to repeat the groups to an arbitrary number.
Upvotes: 0
Views: 131
Reputation: 10466
You can try with this:
r"(.*?]):(?:[^']*)'([^']*)'"g
Look here Group1 and Group 2 for each match contains your key value pair
You can not merge the second match as one by using regex, once you get all the pairs... you can apply some programming here to merge duplicate keys as one.
Here I have used dictionary of list, if a key already exists in the dictionary then you should append the value to the list , otherwise insert a new key with a new list having the value.
This is how it should be done (tested in python 3+)
import re
d = dict()
regex = r"(.*?]):(?:[^']*)'([^']*)'"
test_str = ("headline: \n"
"Status[apphmi]: blubb, 'Statustext1'\n"
"Main[apphmi]: bla, 'Maintext1'Main[apphmi]: blaa, 'Maintext2'\n"
"Popup[apphmi]: blaaa, 'Popuptext1'")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches):
if match.group(1) in d:
d[match.group(1)].append(match.group(2))
else:
d[match.group(1)] = [match.group(2),]
print(d)
Output:
{
'Popup[apphmi]': ['Popuptext1'],
'Main[apphmi]': ['Maintext1', 'Maintext2'],
'Status[apphmi]': ['Statustext1']
}
Upvotes: 1