gogasca
gogasca

Reputation: 10048

XML Parse attributes using namespace

I'm using the following XML:

<feed xmlns:im="http://itunes.apple.com/rss" xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<id>
https://itunes.apple.com/IN/rss/topfreeapplications/limit=200/xml
</id>
<title>iTunes Store: Top Free Apps</title>
<updated>2016-12-05T12:37:06-07:00</updated>
<link rel="alternate" type="text/html" href="https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?cc=in&amp;id=134581&amp;popId=27"/>
<link rel="self" href="https://itunes.apple.com/IN/rss/topfreeapplications/limit=200/xml"/>
<icon>http://itunes.apple.com/favicon.ico</icon>
<author>
    <name>iTunes Store</name>
    <uri>http://www.apple.com/uk/itunes/</uri>
</author>
<rights>Copyright 2008 Apple Inc.</rights>
<entry>
    <updated>2016-12-05T12:37:06-07:00</updated>
<id im:id="473941634" im:bundleId="com.one97.paytm">https://itunes.apple.com/in/app/recharge-bill-payment-wallet/id473941634?mt=8&amp;uo=2</id>
<title>Recharge, Bill Payment &amp; Wallet - Paytm Mobile Solutions</title>
<summary></summary>
<im:name>Recharge, Bill Payment &amp; Wallet</im:name>
<link rel="alternate" type="text/html" href="https://itunes.apple.com/in/app/recharge-bill-payment-wallet/id473941634?mt=8&amp;uo=2"/>
<im:contentType term="Application" label="Application"/>
<category im:id="6024" term="Shopping" scheme="https://itunes.apple.com/in/genre/ios-shopping/id6024?mt=8&amp;uo=2" label="Shopping"/>
<im:artist href="https://itunes.apple.com/in/developer/paytm-mobile-solutions/id473941637?mt=8&amp;uo=2">Paytm Mobile Solutions</im:artist>
<im:price amount="0.00000" currency="INR">Get</im:price>
<im:image height="53">http://is1.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/53x53bb-85.png</im:image>
<im:image height="75">http://is5.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/75x75bb-85.png</im:image>
<im:image height="100">http://is5.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/100x100bb-85.png</im:image>
<rights>© One97 Communications Ltd</rights>
<im:releaseDate label="24 October 2011">2011-10-24T16:18:48-07:00</im:releaseDate>
<content type="html"></content>
</entry>
</feed>

I would like to extract the id information for each entry value: the attribute is as follows: "im:id"

from xml.dom import minidom
xmldoc = minidom.parse('topIN.xml')
itemlist = xmldoc.getElementsByTagName('link')
print(len(itemlist))
print(itemlist[0].attributes.keys())

I get information: 1 [u'href', u'type', u'rel']

But when I do the same of id, nothing returns.

Upvotes: 0

Views: 70

Answers (2)

gogasca
gogasca

Reputation: 10048

This worked:

   from xml.dom import minidom
    xmldoc = minidom.parse('topIN.xml')
    itemlist = xmldoc.getElementsByTagName('entry')
    print(len(itemlist))
    for s in itemlist:
        print s.getElementsByTagName('id')[0].attributes['im:id'].value

Upvotes: 0

Robᵩ
Robᵩ

Reputation: 168596

Here is a version using xml.etree.ElementTree:

import xml.etree.ElementTree as ET

tree = ET.parse('topIN.xml')
root = tree.getroot()
ns={'im':"http://itunes.apple.com/rss", 'atom':"http://www.w3.org/2005/Atom"}
for id_ in root.findall('atom:entry/atom:id', ns):
    print (id_.attrib['{' + ns['im'] + '}id'])

Here is a version using lxml:

from lxml import etree
root=etree.parse('topIN.xml')
ns={'im':"http://itunes.apple.com/rss", 'atom':"http://www.w3.org/2005/Atom"}
print('\n'.join(root.xpath('atom:entry/atom:id/@im:id', namespaces=ns)))

Upvotes: 1

Related Questions