How can I parse a div and get every tag content in different rows?

Question

I am trying to scrape a recipe site which has its ingredients grouped into separate categories, described by the tag in HTML as shown below:

Ingrediensliste Påskeæg med nougat (6 stk) 150 g. marcipan ca. 40 g. nougat 150 g. mørk chokolade 50 g. lys chokolade

I managed to get the ingredients separated into different columns for the amount, unit and ingredient, but I am finding trouble trying to make another column for the content inside the tags.

This is the code that I used.

ingredients = soup.find('div', class_='opskriften') #if len(ingredients.find_all('strong'))>0: s = f"{ingredients}" r = re.compile(r"(?P\d+)\s+(?P\w+.)\s+(?P.+?(?=<))") df = pd.DataFrame([m.groupdict() for m in r.finditer(s)]) with open("somefile.csv", 'w') as fh: df.to_csv(fh)

I tried playing around with the RegEx but couldn't find any solution to make it work.

image of what the website I am scraping off looks like

Dhamodharan · Accepted Answer

Here i have some suggestions for you. There might be problem with parsing due to language that's why the opening of br tags is getting eliminated

from  bs4 import BeautifulSoup
soup_chunk = '''
Ingrediensliste

Påskeæg med nougat (6 stk)
150 g. marcipan 
ca. 40 g. nougat
150 g. mørk chokolade 
50 g. lys chokolade  '''

soup = BeautifulSoup(soup_chunk,'lxml')
requiredData = []
for tags in soup.find_all('p'):
    if tags.select('br'):
        contents = {}
        contents['MainItem'] = tags.select('strong')[0].text
        [i.decompose() for i in tags.select('strong')]
        contents['SubItems'] = [i.strip().replace("
",'') for i in str(tags).split("
") if "" not in i]
        requiredData.append(contents)
print(requiredData)

I put the output in list of dict, so it can be used by anywhere.

[{'MainItem': 'Påskeæg med nougat (6 stk)', 'SubItems': ['150 g. marcipan', 'ca. 40 g. nougat', '150 g. mørk chokolade', '50 g. lys chokolade']}]

How can I parse a div and get every <strong> tag content in different rows?

Answers (2)

Related Questions

How can I parse a div and get every &lt;strong&gt; tag content in different rows?

Answers (2)

Related Questions

How can I parse a div and get every <strong> tag content in different rows?