Reputation: 523
Suppose I have the following XML response from mediawiki api. I want to find out the earliest date that the wiki topic was revised, which in this case is 2005-08-23. How do I parse through the xml to find that out. I'm using python btw.
<?xml version="1.0"?>
<api>
<query-continue>
<revisions rvcontinue="46214352" />
</query-continue>
<query>
<pageids>
<id>2516600</id>
</pageids>
<pages>
<page pageid="2516600" ns="0" title="!Kung language">
<revisions>
<rev timestamp="2005-08-23T00:58:40Z" />
<rev timestamp="2005-08-23T01:01:00Z" />
<rev timestamp="2005-09-02T07:21:37Z" />
<rev timestamp="2005-09-02T07:24:28Z" />
<rev timestamp="2006-01-06T07:45:35Z" />
<rev timestamp="2006-03-22T09:03:23Z" />
<rev timestamp="2006-03-30T05:50:12Z" />
<rev timestamp="2006-03-30T20:33:22Z" />
<rev timestamp="2006-03-30T20:35:05Z" />
<rev timestamp="2006-03-30T20:37:16Z" />
</revisions>
</page>
</pages>
</query>
</api>
I tried the following
revisions = text.getElementsByTagName("revisions")
for x in revisions:
children = x.childNodes
for y in children:
print y.nodeValue
but all this does is print None.
Upvotes: 1
Views: 54
Reputation: 298256
I would use lxml with an XPath expression:
from lxml import etree
root = etree.fromstring(xml)
timestamps = root.xpath('//rev/@timestamp')
As for your code, you aren't getting the attribute of the element. To do that, use getAttribute
:
print y.getAttribute('timestamp')
Upvotes: 1