Reputation: 287
Nelow is a sample XML file that i want to parse through and get the value between the year tags(2008)
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
Is there any way to extract the data between the year tags (2008.2011,etc) and print it using python?
Here is the code so far:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for year in root.iter('year'):
print(year.attrib)
But when i try that code, nothing prints. Any ideas/suggestions?
Upvotes: 0
Views: 3145
Reputation: 187
You can use BeatifulSoup for this.
from bs4 import BeautifulSoup
years = []
with open('country_data.xml') as fp:
soup = BeautifulSoup(fp, 'lxml')
for country in soup.findAll('country'):
years_data = country.find('year')
years.append(years_data.contents[0])
print('Years: {}'.format(years))
Output:
Years: ['2008', '2011', '2011']
Upvotes: 1
Reputation: 24930
It's fairly simple to do it using lxml:
from lxml import etree
tree = etree.parse("country_data.xml")
tree.xpath('//year/text()')
Output:
['2008', '2011', '2011']
Upvotes: 1