Explorer
Explorer

Reputation: 1647

Beautiful soup getting all <li> tags after specific <br> tag

I am trying to get all the Key Ingredients for all the products under a category, This web page shows how the products ingredients are listed, as you can see in the below page source screenshot all the Ingredients are after the br tag with value "Key Ingredients:" Below is my code, I am able to get the all the text but how can I get all the

  • tags right after the "Key Ingredients:"
    tag.

    expected output:

    Glycerin
    Sodium Palmate
    Sodium Palm Kemelate
    Cymbopogon Flexuosus Oil
    Linalool
    Coumarin
    Benzyl Salicylate
    Citral
    

    Code:

    from os.path  import basename
    import requests
    from bs4 import BeautifulSoup
    
    baseurl = "https://www.1mg.com/otc/dettol-original-bathing-soap-bar-125gm-each-buy-4-get-1-free-otc587797"
    
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/74.0.3729.169 Safari/537.36 '
    }
    
    r = requests.get(baseurl, headers=header)
    # soup = BeautifulSoup(r, 'lxml')
    soup = BeautifulSoup(r.content, "html.parser")
    job_element = soup.find("div", class_="otc-container")
    categories = job_element.findAll("a", class_="button-text Breadcrumbs__breadcrumb___XuCvk", href=True)
    # print(categories)
    
    description = job_element.find("div", class_="ProductDescription__description-content___A_qCZ")
    print(description.text)
    
    

    enter image description here

    Upvotes: 0

    Views: 103

  • Answers (1)

    user15398259
    user15398259

    Reputation:

    One way could be to isolate the the key ingredients marker - move to the next tag (<br>) - then process all following tags until you reach the next <br> tag.

    <strong>Key Ingredients:</strong>
    <br>
    <ul>
    ...
    ...
    <ul>
    <br>
    
    

    code:

    >>> for tag in description.find(string='Key Ingredients:').find_next('br').next_elements:
    ...     if tag.name == 'br': break
    ...     if tag.name == 'li': tag.get_text()
    'Glycerin'
    'Sodium Palmate'
    'Sodium Palm Kemelate'
    'Cymbopogon Flexuosus Oil'
    'Linalool'
    'Coumarin'
    'Benzyl Salicylate'
    'Citral'
    

    Upvotes: 1

    Related Questions