Reputation: 1647
I am trying to get all the Key Ingredients for all the products under a category, This web page shows how the products ingredients are listed, as you can see in the below page source screenshot all the Ingredients are after the br tag with value "Key Ingredients:" Below is my code, I am able to get the all the text but how can I get all the
expected output:
Glycerin
Sodium Palmate
Sodium Palm Kemelate
Cymbopogon Flexuosus Oil
Linalool
Coumarin
Benzyl Salicylate
Citral
Code:
from os.path import basename
import requests
from bs4 import BeautifulSoup
baseurl = "https://www.1mg.com/otc/dettol-original-bathing-soap-bar-125gm-each-buy-4-get-1-free-otc587797"
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.169 Safari/537.36 '
}
r = requests.get(baseurl, headers=header)
# soup = BeautifulSoup(r, 'lxml')
soup = BeautifulSoup(r.content, "html.parser")
job_element = soup.find("div", class_="otc-container")
categories = job_element.findAll("a", class_="button-text Breadcrumbs__breadcrumb___XuCvk", href=True)
# print(categories)
description = job_element.find("div", class_="ProductDescription__description-content___A_qCZ")
print(description.text)
Upvotes: 0
Views: 103
Reputation:
One way could be to isolate the the key ingredients marker - move to the next tag (<br>
) - then process all following tags until you reach the next <br>
tag.
<strong>Key Ingredients:</strong>
<br>
<ul>
...
...
<ul>
<br>
code:
>>> for tag in description.find(string='Key Ingredients:').find_next('br').next_elements:
... if tag.name == 'br': break
... if tag.name == 'li': tag.get_text()
'Glycerin'
'Sodium Palmate'
'Sodium Palm Kemelate'
'Cymbopogon Flexuosus Oil'
'Linalool'
'Coumarin'
'Benzyl Salicylate'
'Citral'
Upvotes: 1