neo33
neo33

Reputation: 1879

How to build the following dictionary, from a list using regex?

Hello I have the following data:

hello this is a car
<hamburguer>this car is very good<\hamburguer>I want to fill this rules 
this pencil is red and very good, the movie was very fine
<red>the color is blue and green<\red>
<blue>your favorite color is the yellow<\blue>you want this<red>my smartphone is very expensive<\red>

from this data I got a list as follows:

lines = ['hello this is a car','<hamburguer>this car is very good<\hamburguer>I want to fill this rules','this pencil is red and very good, the movie was very fine','<red>the color is blue and green<\red>','<blue>your favorite color is the yellow<\blue>you want this<red>my smartphone is very expensive<\red>'] 

I would like to build the following dictionary from this list, this is my expected output:

dict_tags = {<hamburguer>:['this car is very good'],<red>:['the color is blue and green','my smartphone is very expensive'],<blue>:['your favorite color is the yellow']}

Since I dont have any idea of how to proceed, I tried the following:

for line in lines:
    pattern = re.search(r"(?<=>)(.*)(?=<)",line)
    if pattern:
        list_tags.append(pattern.group())

However the issue is that I just got:

['this car is very good', 'the color is blue and green', 'your favorite color is the yellow<\x08lue>you want this<red>my smartphone is very expensive']

So I need support to build the dictionary that I need, Thanks for the support, I need the data that is between the tags, for instance:

<red>the color is blue and green<\red>

I need to extract the tag:

<red>

and the information:

the color is blue and green

Upvotes: 1

Views: 41

Answers (2)

Srdjan M.
Srdjan M.

Reputation: 3405

Using only re.finditer.

Regex: <([^>]+)>([^>]+)<\\\1>

lst = {}
for item in re.finditer(r'<([^>]+)>([^>]+)<\\\1>', input):
    lst.setdefault('<%s>' % item.group(1),[]).append(item.group(2))

Output:

{'<red>': ['the color is blue and green', 'my smartphone is very expensive'], '<blue>': ['your favorite color is the yellow'], '<hamburguer>': ['this car is very good']}

Code demo

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With re.findall() function and collections.defaultdict object:

import re, collections

s = '''hello this is a car
<hamburguer>this car is very good<\\hamburguer>I want to fill this rules 
this pencil is red and very good, the movie was very fine
<red>the color is blue and green<\\red>
<blue>your favorite color is the yellow<\\blue>you want this<red>my smartphone is very expensive<\\red>'''

tags_dict = collections.defaultdict(list)
tags = re.findall(r'<([^>]+)>([^<>]+)(<\\\1>)', s)    # find all tags

for tag_open, value, tag_close in tags:
  tags_dict[tag_open].append(value)    # accumulate values for the same tag

print(dict(tags_dict))

The output:

{'hamburguer': ['this car is very good'], 'red': ['the color is blue and green', 'my smartphone is very expensive'], 'blue': ['your favorite color is the yellow']}

Upvotes: 2

Related Questions