Reputation: 285
In the following example, I want to find all titles of books whose prices are 8.99. In other words, I want to find elements' text based on their sibling element's text.
from bs4 import BeautifulSoup
XML = """<?xml version="1.0">
<library>
<book>
<title>The Cat in the Hat</title>
<author>Dr. Seuss</author>
<price>7.35</price>
</book>
<book>
<title>Ender's Game</title>
<author>Orson Scott Card</author>
<price>8.99</price>
</book>
<book>
<title>Prey</title>
<author>Michael Crichton</author>
<price>8.99</price>
</book>
</library>
"""
soup = BeautifulSoup(XML, "xml")
Surprisingly, the query soup.find({"price": 8.99}).parent
will return the wrong book:
<book>
<title>The Cat in the Hat</title>
<author>Dr. Seuss</author>
<price>7.35</price>
</book>
Update
The query [x.parent.find("title").text for x in soup.find_all("price", text = 8.99)]
returns the list ["Ender's Game", "Prey"]
which is what I wanted. But is this the best way to do it?
Upvotes: 2
Views: 660
Reputation: 7402
You can use find_previous_sibling()
.
from bs4 import BeautifulSoup
XML = """<?xml version="1.0">
<library>
<book>
<title>The Cat in the Hat</title>
<author>Dr. Seuss</author>
<price>7.35</price>
</book>
<book>
<title>Ender's Game</title>
<author>Orson Scott Card</author>
<price>8.99</price>
</book>
<book>
<title>Prey</title>
<author>Michael Crichton</author>
<price>8.99</price>
</book>
</library>
"""
soup = BeautifulSoup(XML, "xml")
prices = soup.find_all("price", text=8.99)
for price in prices:
title = price.find_previous_sibling('title')
print(title)
# and with list comprehension
titles = [price.find_previous_sibling('title').text for price in prices]
print(titles)
Output
<title>Ender's Game</title>
<title>Prey</title>
# List comprehension
["Ender's Game", 'Prey']
Upvotes: 1