Jeroen
Jeroen

Reputation: 957

Find div text through div label with beautifulsoup

Assume the following html snippet, from which I would like to extract the values corresponding to the labels 'price' and 'ships from':

<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>

Which is part of a larger html file. Assume that in some files the 'Ships from' label is present, sometimes not. I would like to use BeautifulSoup, of a similar approach, to deal with this, because of the variability of the html content. Multiple div and span are present, which makes it hard to select without id or class name

My thoughts, something like this:

t = open('snippet.html', 'rb').read().decode('iso-8859-1')
s = BeautifulSoup(t, 'lxml')
s.find('div.divName[label*=Price]')
s.find('div.divName[label*=Ships from]')

However, this returns an empty list.

Upvotes: 7

Views: 5139

Answers (3)

QHarr
QHarr

Reputation: 84465

You can use :contains (with bs 4.7.1 and next_sibling

import requests
from bs4 import BeautifulSoup as bs

html = '''
<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>
'''

soup = bs(html, 'lxml')
items = soup.select('label:contains(Price), label:contains("Ships from")')

for item in items:
    print(item.text, item.next_sibling.next_sibling.text)

Upvotes: 2

bharatk
bharatk

Reputation: 4315

Try this :

from bs4 import BeautifulSoup
from bs4.element import Tag

html = """ <div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>"""

s = BeautifulSoup(html, 'lxml')
row = s.find(class_='divName')

Solutio-1 :

for tag in row.findChildren():
    if len(tag) > 1:
        continue
    if tag.name in 'span' and isinstance(tag, Tag):
        print(tag.text)
    elif tag.name in 'div' and isinstance(tag, Tag):
        print(tag.text)

Solution-2:

for lab in row.select("label"):
    print(lab.find_next_sibling().text)

O/P:

22.99
EU

Upvotes: 2

Rakesh
Rakesh

Reputation: 82765

Use select to find label and then use find_next_sibling().text

Ex:

from bs4 import BeautifulSoup

html = """<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>"""

soup = BeautifulSoup(html, "html.parser")
for lab in soup.select("label"):
    print(lab.find_next_sibling().text)

Output:

22.99
EU

Upvotes: 4

Related Questions