Reputation:
i'm trying to extract data from a XML file with a Python script but I can't get it to go deep enough. I succeeded to extract 'updated' and 'published' categories for instance but not the rest. I'm particularly interested into extracting alt1 and alt2.
Here is the structure of the xml file :
<?xml version='1.0' encoding='UTF-8'?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:as="http://atomserver.org/namespaces/1.0/">
<id>/electron/atom/v1/domesday/dblocks-CI-52-54/CI-522000-5473000.xml</id>
<as:revision>0</as:revision>
<updated>2011-04-29T11:40:19.000Z</updated>
<published>2011-04-29T11:40:19.000Z</published>
<content type="application/xml">
<block xmlns="">
<alt1>Some text</alt1>
<alt2>Some other thext</alt2>
</block>
</content>
</entry>
And here is what I tried so far :
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root = tree.getroot()
alt1elt = root.findtext('content/dblock/alt1')
alt2elt = root.findtext('content/dblock/alt2')
print(alt1elt)
print(alt2elt)
It prints
None
None
and not the two strings I'm trying to get. Do you have any idea what could solve this ?
Upvotes: 2
Views: 1842
Reputation: 34
from xml.dom import minidom
doc = minidom.parse("yourxmlfile.xml")
print(doc.getElementsByTagName("alt1")[0].firstChild.data)
print(doc.getElementsByTagName("alt2")[0].firstChild.data)
Example of extracting the data using minidom.
Upvotes: 1