Reputation: 13
So that's how HTML looks:
<p class="details">
<span>detail1</span>
<span class="number">1</span>
<span>detail2</span>
<span>detail3</span>
</p>
I need to extract detail2 & detail3.
But with this piece of code I only get detail1.
info = data.find("p", class_ = "details").span.text
How do I extract the needed items?
Thanks in advance!
Upvotes: 0
Views: 26
Reputation: 195438
You can find all <span>
s and do normal indexing:
from bs4 import BeautifulSoup
html_doc = """\
<p class="details">
<span>detail1</span>
<span class="number">1</span>
<span>detail2</span>
<span>detail3</span>
</p>"""
soup = BeautifulSoup(html_doc, "html.parser")
spans = soup.find("p", class_="details").find_all("span")
for s in spans[-2:]:
print(s.text)
Prints:
detail2
detail3
Or CSS selectors:
spans = soup.select(".details span:nth-last-of-type(-n+2)")
for s in spans:
print(s.text)
Prints:
detail2
detail3
Upvotes: 0
Reputation: 25073
Select your elements more specific in your case all sibling <span>
of <span>
with class
number:
soup.select('span.number ~ span')
from bs4 import BeautifulSoup
html='''<p class="details">
<span>detail1</span>
<span class="number">1</span>
<span>detail2</span>
<span>detail3</span>
</p>'''
soup = BeautifulSoup(html)
[t.text for t in soup.select('span.number ~ span')]
['detail2', 'detail3']
Upvotes: 1