Reputation: 1569
I have an xml file with the following data
<year>2013</year>
<youSaveSpend>2500</youSaveSpend>
<yourMpgVehicle>
<avgMpg>32.261695541</avgMpg>
<cityPercent>43</cityPercent>
<highwayPercent>57</highwayPercent>
</yourMpgVehicle>
<year>2013</year>
<youSaveSpend>3000</youSaveSpend>
<yourMpgVehicle>
<avgMpg>33.383275416</avgMpg>
<cityPercent>49</cityPercent>
<highwayPercent>51</highwayPercent>
</yourMpgVehicle>
<year>2012</year>
<youSaveSpend>2500</youSaveSpend>
<yourMpgVehicle>
<avgMpg>36.210640188</avgMpg>
<cityPercent>32</cityPercent>
<highwayPercent>68</highwayPercent>
</yourMpgVehicle>
I want to use BeautifulSoup to return a list of avgMpg for only year 2013? How can I do that?
My current effort has been:
for item in soupedCarAvgMpgPage.findAll('year'):
listOfYears.append(''.join(item.findAll(text=True)))
for item in soupedCarAvgMpgPage.findAll('avgmpg'):
listOfAvgMpg.append(''.join(item.findAll(text=True)))
print listOfYears
print listOfAvgMpg;
dictionaryYearToAvgMpg = dict(zip(listOfYears, listOfAvgMpg));
but the dictionary does not accept duplicates :S
Upvotes: 1
Views: 2258
Reputation: 396
Since we know the elements are going to be near each other, we can get there by searching through next_siblings
:
from bs4 import BeautifulSoup
with open('mpg.xml') as f:
contents=f.read()
mpgs = BeautifulSoup(contents)
def find_nearest_vehicle(elem):
for sibling in elem.next_siblings:
if sibling.name == 'yourmpgvehicle':
return sibling
def find_avg_mpg(elem):
for child in elem.children:
if child.name == 'avgmpg':
return child
year_2013 = [year for year in mpgs.find_all('year')
if year.string == '2013']
avgmpg = [find_avg_mpg(find_nearest_vehicle(elem)).string
for elem in year_2013]
print(avgmpg)
When I run this on your file, I get:
$ python3 mpg.py
['32.261695541', '33.383275416']
Upvotes: 1
Reputation: 9117
You're almost there, you can just change your final line into this:
result = [avgMpg for year, avgMpg in zip(listOfYears, listOfAvgMpg) if year=='2013']
Note that the 2013
is a string, not an integer.
Or, for shortened overall code (I converted the years into int
s and avgMpg
s into float
s):
from bs4 import BeautifulSoup as BS
soup = BS(string, 'lxml')
listOfYears = [int(el.string) for el in soup.find_all('year')]
listOfAvgMpg = [float(el.string) for el in soup.find_all('avgmpg')]
result = [avgMpg for year, avgMpg in zip(listOfYears, listOfAvgMpg) if year==2013]
print result
Result:
[32.261695541, 33.383275416]
Upvotes: 1