Reputation: 415
I have an xml file that looks like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="http://data.treasury.gov:8001/Feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
<title type="text">DailyTreasuryYieldCurveRateData</title>
<id>http://data.treasury.gov:8001/feed.svc/DailyTreasuryYieldCurveRateData</id>
<updated>2015-08-30T15:17:09Z</updated>
<link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
<entry>
<id>http://data.treasury.gov:8001/Feed.svc/DailyTreasuryYieldCurveRateData(6404)</id>
<title type="text"></title>
<updated>2015-08-30T15:17:09Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6404)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6404</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2015-08-03T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.03</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.08</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.17</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.33</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">0.68</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">0.99</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.52</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">1.89</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.16</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.55</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.86</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.86</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
<entry>
<id>http://data.treasury.gov:8001/Feed.svc/DailyTreasuryYieldCurveRateData(6405)</id>
<title type="text"></title>
<updated>2015-08-30T15:17:09Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6405)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">6405</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">2015-08-04T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double">0.05</d:BC_1MONTH>
<d:BC_3MONTH m:type="Edm.Double">0.08</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">0.18</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">0.37</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">0.74</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">1.08</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">1.6</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">1.97</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">2.23</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">2.59</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">2.9</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">2.9</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
</feed>
How can I parse out the '2.16' for 'BC_10YEAR'? I've been looking at other examples with ElementTree and lxml and I just can't seem to match up the xml format in those examples with that of my file.
The last thing I've tried was:
from lxml import etree
doc = etree.parse(yield_xml)
memoryElem = doc.find('content')
print memoryElem.text # element text
print memoryElem.get('type') # attribute
I get an error: AttributeError: 'NoneType' object has no attribute 'text'
Is there a simple way to do this?
Upvotes: 0
Views: 785
Reputation: 89325
I'd suggest to use lxml
's xpath()
method which provide better XPath expression support :
from lxml import etree
doc = etree.parse(yield_xml)
#register prefixes to be used in xpath
ns = {"foo": "http://www.w3.org/2005/Atom",
"d": "http://schemas.microsoft.com/ado/2007/08/dataservices",
"m": "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"}
#select element <d:BC_10YEAR>, and convert the value to number
result = doc.xpath("number(//foo:content/m:properties/d:BC_10YEAR)", namespaces=ns)
#print the result
print(result)
print(type(result))
output :
2.16
<type 'float'>
In case you wonder why foo:content
instead of just foo
in the xpath expression above, that's because content
inherits default namespace from the root element, implicitly. And the default namespace uri is mapped to prefix foo
in the above code; related question : parsing xml containing default namespace to get an element value using lxml
Upvotes: 0
Reputation: 926
You may try built-in split method:
>>>[data.split('>')[1].split('<')[0] for data in str(xml_file).split('<d:') if 'BC_10YEAR' in data][0]
'2.16'
Upvotes: 1