Reputation: 957
Assume the following html snippet, from which I would like to extract the values corresponding to the labels 'price' and 'ships from':
<div class="divName">
<div>
<label>Price</label>
<div>22.99</div>
</div>
<div>
<label>Ships from</label>
<span>EU</span>
</div>
</div>
Which is part of a larger html file. Assume that in some files the 'Ships from' label is present, sometimes not. I would like to use BeautifulSoup, of a similar approach, to deal with this, because of the variability of the html content. Multiple div
and span
are present, which makes it hard to select without id or class name
My thoughts, something like this:
t = open('snippet.html', 'rb').read().decode('iso-8859-1')
s = BeautifulSoup(t, 'lxml')
s.find('div.divName[label*=Price]')
s.find('div.divName[label*=Ships from]')
However, this returns an empty list.
Upvotes: 7
Views: 5139
Reputation: 84465
You can use :contains
(with bs 4.7.1 and next_sibling
import requests
from bs4 import BeautifulSoup as bs
html = '''
<div class="divName">
<div>
<label>Price</label>
<div>22.99</div>
</div>
<div>
<label>Ships from</label>
<span>EU</span>
</div>
</div>
'''
soup = bs(html, 'lxml')
items = soup.select('label:contains(Price), label:contains("Ships from")')
for item in items:
print(item.text, item.next_sibling.next_sibling.text)
Upvotes: 2
Reputation: 4315
Try this :
from bs4 import BeautifulSoup
from bs4.element import Tag
html = """ <div class="divName">
<div>
<label>Price</label>
<div>22.99</div>
</div>
<div>
<label>Ships from</label>
<span>EU</span>
</div>
</div>"""
s = BeautifulSoup(html, 'lxml')
row = s.find(class_='divName')
Solutio-1 :
for tag in row.findChildren():
if len(tag) > 1:
continue
if tag.name in 'span' and isinstance(tag, Tag):
print(tag.text)
elif tag.name in 'div' and isinstance(tag, Tag):
print(tag.text)
Solution-2:
for lab in row.select("label"):
print(lab.find_next_sibling().text)
O/P:
22.99
EU
Upvotes: 2
Reputation: 82765
Use select
to find label
and then use find_next_sibling().text
Ex:
from bs4 import BeautifulSoup
html = """<div class="divName">
<div>
<label>Price</label>
<div>22.99</div>
</div>
<div>
<label>Ships from</label>
<span>EU</span>
</div>
</div>"""
soup = BeautifulSoup(html, "html.parser")
for lab in soup.select("label"):
print(lab.find_next_sibling().text)
Output:
22.99
EU
Upvotes: 4