jaysonpryde
jaysonpryde

Reputation: 2813

Python: Get specific node values and attributes using lxml + objectify + findall or fromstring

I took out and cut a portion of an XML source from NVD and below is the snippet:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns="http://nvd.nist.gov/feeds/cve/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nvd.nist.gov/feeds/cve/1.2 http://nvd.nist.gov/schema/nvdcve.xsd" pub_date="2014-07-01" nvd_xml_version="1.2">
   <entry CVSS_base_score="6.4" CVSS_exploit_subscore="10.0" CVSS_impact_subscore="4.9" CVSS_score="6.4" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:P/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2011-1381" published="2014-06-27" seq="2011-1381" severity="Medium" type="CVE">
      <desc>
        <descript source="cve">Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.</descript>
      </desc>
   </entry>
   <entry CVSS_base_score="3.5" CVSS_exploit_subscore="6.8" CVSS_impact_subscore="2.9" CVSS_score="3.5" CVSS_vector="(AV:N/AC:M/Au:S/C:P/I:N/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2014-4669" published="2014-06-28" seq="2014-4669" severity="Low" type="CVE">
      <desc>
        <descript source="cve">HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.</descript>
      </desc>
   </entry>
</nvd>

As mentioned on the title of this question and for the related snippet above, I just want to get the value and the attrib of the 'descript' node. I tried using the findall method but it's returning an empty list:

root = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
root.findall('entry')

This returns:

[]

When I print the tag of the root, here's what it returns:

'{http://nvd.nist.gov/feeds/cve/1.2}nvd'

I also tried printing the tags of the immediate parent and its children:

for e in root.iterchildren():
print "Immediate parent : %s" % e.tag
children = e.getchildren()
for c in children : print "\t\tchildren : %s" % c.tag

Here's what it returns:

Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc
Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc

Again, what I just want is to get the attrib and value of the 'descript' node. Any ideas are greatly appreciated. Thanks in advance!

Upvotes: 1

Views: 1895

Answers (1)

alecxe
alecxe

Reputation: 473883

You need to add namespace prefixes in the xpath expression:

tree = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
for descript in tree.xpath('//ns:entry/ns:desc/ns:descript', namespaces={'ns': 'http://nvd.nist.gov/feeds/cve/1.2'}):
    print descript.text
    print descript.attrib.get('source')

Prints:

Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.
cve
HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.
cve

Also see this relevant thread:

Upvotes: 2

Related Questions