Reputation: 6535
I can an xml file and loop through the root printing, but root.iter('tag')
, root.find('tag')
and root.findall('tag')
will not work.
Here is a sample of the XML:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xsi:schemaLocation="http://scap.nist.gov/schema/cpe-extension/2.3 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd http://cpe.mitre.org/dictionary/2.0 https://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 https://scap.nist.gov/schema/cpe/2.1/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/scap-core/0.3 https://scap.nist.gov/schema/nvd/scap-core_0.3.xsd http://scap.nist.gov/schema/configuration/0.1 https://scap.nist.gov/schema/nvd/configuration_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 https://scap.nist.gov/schema/nvd/scap-core_0.1.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>4.4</product_version>
<schema_version>2.3</schema_version>
<timestamp>2021-05-21T03:50:31.204Z</timestamp>
</generator>
<cpe-item name="cpe:/a:%240.99_kindle_books_project:%240.99_kindle_books:6::~~~android~~">
<title xml:lang="en-US">$0.99 Kindle Books project $0.99 Kindle Books (aka com.kindle.books.for99) for android 6.0</title>
<references>
<reference href="https://play.google.com/store/apps/details?id=com.kindle.books.for99">Product information</reference>
<reference href="https://docs.google.com/spreadsheets/d/1t5GXwjw82SyunALVJb2w0zi3FoLRIkfGPc7AMjRF0r4/edit?pli=1#gid=1053404143">Government Advisory</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\$0.99_kindle_books_project:\$0.99_kindle_books:6:*:*:*:*:android:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:%40thi.ng%2fegf_project:%40thi.ng%2fegf:-::~~~node.js~~">
<title xml:lang="en-US">@thi.ng/egf Project @thi.ng/egf for Node.js</title>
<references>
<reference href="https://github.com/thi-ng/umbrella/security/advisories/GHSA-rj44-gpjc-29r7">Advisory</reference>
<reference href="https://www.npmjs.com/package/@thi.ng/egf">Version</reference>
</references>
<cpe-23:cpe23-item name="cpe:2.3:a:\@thi.ng\/egf_project:\@thi.ng\/egf:-:*:*:*:*:node.js:*:*"/>
</cpe-item>
</cpe-list>
The followig Python (3.7) code works:
import xml.etree.ElementTree as ET
infile = open(filename, "r")
xml = infile.read()
infile.close()
parser = ET.XMLParser(encoding="utf-8")
root = ET.fromstring(xml, parser=parser)
print(root.tag)
for child in root:
print(child.tag)
Output:
{http://cpe.mitre.org/dictionary/2.0}cpe-list
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
...
But when I try:
for item in root.iter('cpe-item')
or for item in root.iter('cpe-list')
, nothing loops. When I try for item in root.findall('cpe-item')
or for item in root.findall('cpe-list')
, nothing loops. If I try item = root.find('cpe-list')
, item = None
.
I don't work with XML very often, but this seems so strage to me since I have some example code of other projects where this works perfectly fine. Many other examples online show this exact process is the correct process.
What is am I doing wrong?
It seems odd to me that when I print(root.tag)
or print(child.tag)
there is something before the tag prints. I don't know why that is happening.
Upvotes: 1
Views: 1625
Reputation: 24940
You are getting entangled with namespaces. A lot has been written about it and starting here may be a good place.
As for you specific example, the tl;dr is to disregard them altogether. For example:
for item in root.findall('.//{*}cpe-item'):
print(item.tag)
Another option is to bite the bullet and declare the namespaces:
ns = {"xx":"http://cpe.mitre.org/dictionary/2.0"}
for item in root.findall('.//xx:cpe-item', ns):
print(item.tag)
output is
{http://cpe.mitre.org/dictionary/2.0}cpe-item
{http://cpe.mitre.org/dictionary/2.0}cpe-item
Upvotes: 1