Reputation: 3
I have the following structure of an XML page:
<address>
<city>Anaheim</city>
<state>California</state>
<zip>92801</zip>
<country>United States</country>
</address>
<address>
<city>Berkley</city>
<state>California</state>
<zip>94705</zip>
<country>United States</country>
</address>
I would like to get only the values of the city tags, where the zip tag value meets a condition. For example I need those city names, where the zip=92801.
Is there a simple way in python to do this?
Upvotes: 0
Views: 572
Reputation: 11971
If you want to use Beautiful Soup instead:
my_string = '''
<root>
<address>
<city>Anaheim</city>
<state>California</state>
<zip>92801</zip>
<country>United States</country>
</address>
<address>
<city>Berkley</city>
<state>California</state>
<zip>94705</zip>
<country>United States</country>
</address>
</root>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(my_string, 'html.parser')
desired_zips = soup.findAll('zip', text="92801")
cities = []
for zip_tag in desired_zips:
cities.append(zip_tag.findPreviousSibling('city'))
print(cities)
Output:
[<city>Anaheim</city>]
Note: you could write this for
loop into a list comprehension, but it looks clunky and unreadable.
Upvotes: 0
Reputation: 1406
How about you use ElementTree
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
filtered_addresses = []
for address in root.findall('address'):
if address.get('zip') == '92801':
filtered_addresses.append(address)
Upvotes: 0
Reputation: 11971
This will achieve the desired results:
my_string = '''
<root>
<address>
<city>Anaheim</city>
<state>California</state>
<zip>92801</zip>
<country>United States</country>
</address>
<address>
<city>Berkley</city>
<state>California</state>
<zip>94705</zip>
<country>United States</country>
</address>
</root>
'''
from lxml import etree
root = etree.fromstring(my_string)
cities = root.xpath('.//zip[text()="92801"]/preceding-sibling::city')
Upvotes: 2