Reputation: 1879
Hello I have the following data:
hello this is a car
<hamburguer>this car is very good<\hamburguer>I want to fill this rules
this pencil is red and very good, the movie was very fine
<red>the color is blue and green<\red>
<blue>your favorite color is the yellow<\blue>you want this<red>my smartphone is very expensive<\red>
from this data I got a list as follows:
lines = ['hello this is a car','<hamburguer>this car is very good<\hamburguer>I want to fill this rules','this pencil is red and very good, the movie was very fine','<red>the color is blue and green<\red>','<blue>your favorite color is the yellow<\blue>you want this<red>my smartphone is very expensive<\red>']
I would like to build the following dictionary from this list, this is my expected output:
dict_tags = {<hamburguer>:['this car is very good'],<red>:['the color is blue and green','my smartphone is very expensive'],<blue>:['your favorite color is the yellow']}
Since I dont have any idea of how to proceed, I tried the following:
for line in lines:
pattern = re.search(r"(?<=>)(.*)(?=<)",line)
if pattern:
list_tags.append(pattern.group())
However the issue is that I just got:
['this car is very good', 'the color is blue and green', 'your favorite color is the yellow<\x08lue>you want this<red>my smartphone is very expensive']
So I need support to build the dictionary that I need, Thanks for the support, I need the data that is between the tags, for instance:
<red>the color is blue and green<\red>
I need to extract the tag:
<red>
and the information:
the color is blue and green
Upvotes: 1
Views: 41
Reputation: 3405
Using only re.finditer
.
Regex: <([^>]+)>([^>]+)<\\\1>
lst = {}
for item in re.finditer(r'<([^>]+)>([^>]+)<\\\1>', input):
lst.setdefault('<%s>' % item.group(1),[]).append(item.group(2))
Output:
{'<red>': ['the color is blue and green', 'my smartphone is very expensive'], '<blue>': ['your favorite color is the yellow'], '<hamburguer>': ['this car is very good']}
Upvotes: 1
Reputation: 92854
With re.findall()
function and collections.defaultdict
object:
import re, collections
s = '''hello this is a car
<hamburguer>this car is very good<\\hamburguer>I want to fill this rules
this pencil is red and very good, the movie was very fine
<red>the color is blue and green<\\red>
<blue>your favorite color is the yellow<\\blue>you want this<red>my smartphone is very expensive<\\red>'''
tags_dict = collections.defaultdict(list)
tags = re.findall(r'<([^>]+)>([^<>]+)(<\\\1>)', s) # find all tags
for tag_open, value, tag_close in tags:
tags_dict[tag_open].append(value) # accumulate values for the same tag
print(dict(tags_dict))
The output:
{'hamburguer': ['this car is very good'], 'red': ['the color is blue and green', 'my smartphone is very expensive'], 'blue': ['your favorite color is the yellow']}
Upvotes: 2