I'm Root James
I'm Root James

Reputation: 6535

Python XML ElementTree cannot iter(), find() or findall()

I can an xml file and loop through the root printing, but root.iter('tag'), root.find('tag') and root.findall('tag') will not work.

Here is a sample of the XML:

<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xsi:schemaLocation="http://scap.nist.gov/schema/cpe-extension/2.3 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd http://cpe.mitre.org/dictionary/2.0 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 https://scap.nist.gov/schema/cpe/2.1/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/scap-core/0.3 https://scap.nist.gov/schema/nvd/scap-core_0.3.xsd http://scap.nist.gov/schema/configuration/0.1 https://scap.nist.gov/schema/nvd/configuration_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 https://scap.nist.gov/schema/nvd/scap-core_0.1.xsd">
  <generator>
    <product_name>National Vulnerability Database (NVD)</product_name>
    <product_version>4.4</product_version>
    <schema_version>2.3</schema_version>
    <timestamp>2021-05-21T03:50:31.204Z</timestamp>
  </generator>
  <cpe-item name="cpe:/a:%240.99_kindle_books_project:%240.99_kindle_books:6::~~~android~~">
    <title xml:lang="en-US">$0.99 Kindle Books project $0.99 Kindle Books (aka com.kindle.books.for99) for android 6.0</title>
    <references>
      <reference href="https://play.google.com/store/apps/details?id=com.kindle.books.for99">Product information</reference>
      <reference href="https://docs.google.com/spreadsheets/d/1t5GXwjw82SyunALVJb2w0zi3FoLRIkfGPc7AMjRF0r4/edit?pli=1#gid=1053404143">Government Advisory</reference>
    </references>
    <cpe-23:cpe23-item name="cpe:2.3:a:\$0.99_kindle_books_project:\$0.99_kindle_books:6:*:*:*:*:android:*:*"/>
  </cpe-item>
  <cpe-item name="cpe:/a:%40thi.ng%2fegf_project:%40thi.ng%2fegf:-::~~~node.js~~">
    <title xml:lang="en-US">@thi.ng/egf Project @thi.ng/egf for Node.js</title>
    <references>
      <reference href="https://github.com/thi-ng/umbrella/security/advisories/GHSA-rj44-gpjc-29r7">Advisory</reference>
      <reference href="https://www.npmjs.com/package/@thi.ng/egf">Version</reference>
    </references>
    <cpe-23:cpe23-item name="cpe:2.3:a:\@thi.ng\/egf_project:\@thi.ng\/egf:-:*:*:*:*:node.js:*:*"/>
  </cpe-item>
</cpe-list>

The followig Python (3.7) code works:

import xml.etree.ElementTree as ET
infile = open(filename, "r")
xml = infile.read()
infile.close()
parser = ET.XMLParser(encoding="utf-8")
root = ET.fromstring(xml, parser=parser)
print(root.tag)
for child in root:
    print(child.tag)

Output:
{http://cpe.mitre.org/dictionary/2.0}cpe-list
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
...

But when I try:

for item in root.iter('cpe-item') or for item in root.iter('cpe-list'), nothing loops. When I try for item in root.findall('cpe-item') or for item in root.findall('cpe-list'), nothing loops. If I try item = root.find('cpe-list'), item = None.

I don't work with XML very often, but this seems so strage to me since I have some example code of other projects where this works perfectly fine. Many other examples online show this exact process is the correct process.

What is am I doing wrong? It seems odd to me that when I print(root.tag) or print(child.tag) there is something before the tag prints. I don't know why that is happening.

Upvotes: 1

Views: 1625

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24940

You are getting entangled with namespaces. A lot has been written about it and starting here may be a good place.

As for you specific example, the tl;dr is to disregard them altogether. For example:

for item in root.findall('.//{*}cpe-item'):
    print(item.tag)

Another option is to bite the bullet and declare the namespaces:

ns = {"xx":"http://cpe.mitre.org/dictionary/2.0"}
for item in root.findall('.//xx:cpe-item', ns):
        print(item.tag)

output is

{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item

Upvotes: 1

Related Questions