Chang9
Chang9

Reputation: 3

Cannot get all attributes when parsing XML file with python

This is the XML file 'test_xml2.xml'

<feed xml:lang='en'>
  <title>HackerRank</title>
  <subtitle lang='en'>Programming challenges</subtitle>
  <link rel='alternate' type='text/html' href='http://hackerrank.com/'/>
  <updated>2013-12-25T12:00:00</updated>
  <entry>
    <author gender='male'>Harsh</author>
    <question type='hard'>XML 1</question>
    <description type='text'>This is related to XML parsing</description>
  </entry>
</feed>

It actually has 8 attributes.

But with my code

import xml.etree.ElementTree as etree

count = 0
xml = 'test_xml2.xml'
tree = etree.parse(xml)
root = tree.getroot()
for item in root:
    count += len(item.attrib)
    print item.keys()
print count

I get result '4'.

Could someone please tell me what's going wrong?

Upvotes: 0

Views: 1111

Answers (3)

Anand S Kumar
Anand S Kumar

Reputation: 90979

When you perform the loop for item in root: it only iterates over the immediate children of root and not its descendants.

One way to meet your requirement would be to use the xpath - .//* to get all elements in the xml (as a list) and then iterate over that to get the list of attributes.

Please note, the xpath - .//* - will not return the root itself, so count needs to be initialized with length of root's attrib.

Example -

>>> count = len(root.attrib)
>>> elements = root.findall(".//*")
>>> for item in elements:
...     count += len(item.attrib)
...     print(item.keys())
[]
['lang']
['href', 'type', 'rel']
[]
[]
['gender']
['type']
['type']
>>> print(count)
8

Upvotes: 1

Scott Hunter
Scott Hunter

Reputation: 49893

The items in root are the title, subtitle, link, updated and entry nodes; subtitle has 1 attribute (lang) and link has 3 (rel, type and href): 4 attributes.

Your code needs to dive into the items in the items of root (entry, specifically).

Upvotes: 0

Robᵩ
Robᵩ

Reputation: 168726

This loop:

for item in root:
    count += len(item.attrib)

iterates over the immediate children of root, not the grandchildren or deeper descendents.

Perhaps this will help:

for item in root.iter():
    count += len(item.attrib)

Upvotes: 1

Related Questions