user3699853
user3699853

Reputation: 93

Python xml parsing etree find element X by postion

I'm trying to parse the following xml to pull out certain data then eventually edit the data as needed.

Here is the xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CHECKLIST>
<VULN>
    <STIG_DATA>
        <VULN_ATTRIBUTE>Vuln_Num</VULN_ATTRIBUTE>
        <ATTRIBUTE_DATA>V-38438</ATTRIBUTE_DATA>
    </STIG_DATA>
    <STIG_DATA>
        <VULN_ATTRIBUTE>Rule_Title</VULN_ATTRIBUTE>
        <ATTRIBUTE_DATA>More text.</ATTRIBUTE_DATA>
    </STIG_DATA>
    <STIG_DATA>
        <VULN_ATTRIBUTE>Vuln_Discuss</VULN_ATTRIBUTE>
        <ATTRIBUTE_DATA>Some text here</ATTRIBUTE_DATA>
    </STIG_DATA>
    <STIG_DATA>
        <VULN_ATTRIBUTE>IA_Controls</VULN_ATTRIBUTE>
        <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
    </STIG_DATA>
    <STIG_DATA>
        <VULN_ATTRIBUTE>Rule_Ver</VULN_ATTRIBUTE>
        <ATTRIBUTE_DATA>Gen000000</ATTRIBUTE_DATA>
    </STIG_DATA>
    <STATUS>NotAFinding</STATUS>
    <FINDING_DETAILS></FINDING_DETAILS>
    <COMMENTS></COMMENTS>        
    <SEVERITY_OVERRIDE></SEVERITY_OVERRIDE>
    <SEVERITY_JUSTIFICATION></SEVERITY_JUSTIFICATION>
</VULN>

The data that I'm looking to pull from this is the STATUS, COMMENTS and the ATTRIBUTE_DATA directly following VULN_ATTRIBUTE that matches == Rule_Ver. So in this example.

I should get the following: Gen000000 NotAFinding None

What I have so far is that I can get the Status and Comments easy, but can't figure out the ATTRIBUTE_DATA portion. I can find the first one (Vuln_Num), then I tried to add a index but that gives a "list index out of range" error.

This is where I'm at now.

import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
root=doc.getroot()

TagList = doc.findall("./VULN")

for curTag in TagList:
    StatusTag = curTag.find("STATUS")
    CommentTag = curTag.find("COMMENTS")
    DataTag = curTag.find("./STIG_DATA/ATTRIBUTE_DATA")
    print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)

This gives the following output: GEN:[V-38438] Status:[NotAFinding] Comments: None

I want: GEN:[Gen000000] Status:[NotAFinding] Comments: None

So the end goal is to be able to parse hundreds of these and edit the comments field as needed. I don't think the editing part will be that hard once I get the right element.

Logically I see two ways of doing this. Either go to the ATTRIBUTE_DATA[5] and grab the text or find VULN_ATTRIBUTE == Rule_Ver then grab the next ATTRIBUTE_DATA.

I have tried doing this:

DataTag = curTag.find(".//STIG_DATA//ATTRIBUTE_DATA")[5] andDataTag[5].text`

and both give meIndexError: list index out of range

I saw lxml had get_element_by_id and xpath, but I can't add modules to this system so it is etree for me.

Thanks in advance.

Upvotes: 2

Views: 1158

Answers (1)

Robᵩ
Robᵩ

Reputation: 168596

One can find an element by position, but you've used the incorrect XPath syntax. Either of the following lines should work:

DataTag = curTag.find("./STIG_DATA[5]/ATTRIBUTE_DATA")    # Note: 5, not 4
DataTag = curTag.findall("./STIG_DATA/ATTRIBUTE_DATA")[4] # Note: 4, not 5

However, I strongly recommend against using that. There is no guarantee that the Rule_Ver instance of STIG_DATA is always the fifth item.

If you could change to lxml, then this works:

DataTag = curTag.xpath(
    './STIG_DATA/VULN_ATTRIBUTE[text()="Rule_Ver"]/../ATTRIBUTE_DATA')[0]

Since you can't use lxml, you must iterate the STIG_DATA elements by hand, like so:

def GetData(curTag):
    for stig in curTag.findall('STIG_DATA'):
        if stig.find('VULN_ATTRIBUTE').text == 'Rule_Ver':
            return stig.find('ATTRIBUTE_DATA')

Here is a complete program with error checking added to GetData():

import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
root=doc.getroot()

TagList = doc.findall("./VULN")

def GetData(curTag):
    for stig in curTag.findall('STIG_DATA'):
        vuln = stig.find('VULN_ATTRIBUTE')
        if vuln is not None and vuln.text == 'Rule_Ver':
            data = stig.find('ATTRIBUTE_DATA')
            return data

for curTag in TagList:
    StatusTag = curTag.find("STATUS")
    CommentTag = curTag.find("COMMENTS")
    DataTag = GetData(curTag)
    print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)

References:

Upvotes: 2

Related Questions