Peter Asp
Peter Asp

Reputation: 39

Parse xml string by attribute name

I'm using python and I want to get some value from XML string.

For example if I have this XML string, which I'm getting from CSV:

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
    <System>
        <Provider Name='Microsoft-Windows' Guid='{aaaa-ss-www-qqq-qeqweqwe}'/>
        <EventID>4771</EventID>
        <Version>0</Version>
        <Level>0</Level>
        <Task>1000</Task>
        <Opcode>0</Opcode>
        <Keywords>0x9110</Keywords>
        <TimeCreated SystemTime='2022-01-01T00:00:00.000000Z'/>
        <EventRecordID>123123123</EventRecordID>
        <Correlation/>
        <Execution ProcessID='2' ThreadID='11'/>
        <Channel>Security</Channel>
        <Computer>pcname</Computer>
        <Security/>
    </System>
    <EventData>
        <Data Name='TargetUserName'>user</Data>
        <Data Name='TargetSid'>S-1-5-21-123123-321312-123132-31212</Data>
        <Data Name='ServiceName'>service/dom</Data>
        <Data Name='TicketOptions'>0x123123</Data>
        <Data Name='Status'>0xq</Data>
        <Data Name='PreAuthType'>0</Data>
        <Data Name='IpAddress'>::ffff:8.8.8.8</Data>
        <Data Name='IpPort'>123321</Data>
        <Data Name='CertIssuerName'></Data>
        <Data Name='CertSerialNumber'></Data>
        <Data Name='CertThumbprint'></Data>
    </EventData>
</Event>

And I've got some code, with I can get some values by attribute path:

import os, csv
import xml.etree.ElementTree as ET

def cls():
    os.system('cls' if os.name=='nt' else 'clear')
cls()

raw = open('C:/tmp2/data.csv', 'r')
reader = csv.reader(raw)
line_number = 1
for i, row in enumerate(reader):
    if i == line_number:
        break

tree = ET.fromstring(''.join(row))
EventID = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}EventID')]
TimeCreated = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}System/{http://schemas.microsoft.com/win/2004/08/events/event}TimeCreated[@Name="SystemTime"]')]
TargetUserName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[@Name="TargetUserName"]')]
ServiceName = [literal.text for literal in tree.findall('.//{http://schemas.microsoft.com/win/2004/08/events/event}EventData/{http://schemas.microsoft.com/win/2004/08/events/event}Data[@Name="ServiceName"]')]

print ('EVENT:',''.join(EventID))
print ('TimeCreated:',''.join(TimeCreated))
print ('TargetUserName:',''.join(TargetUserName))
print ('ServiceName:', ''.join(ServiceName))

How to get value of attribute, like EventID by attribute name?

Upvotes: 0

Views: 59

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24940

You're close, though you should approach the namespaces a bit differntly and, if I understand you correctly, modify your TimeCreated:

ns = {'': 'http://schemas.microsoft.com/win/2004/08/events/event'}

TimeCreated= [tc.attrib['SystemTime'] for tc in tree.findall('.//System//TimeCreated[@SystemTime]',namespaces=ns)]
EventID = [eid.text for eid in tree.findall('.//System//EventID',namespaces=ns)]
TargetUserName = [tun.text for tun in tree.findall('.//EventData//Data[@Name="TargetUserName"]',namespaces=ns)]
ServiceName = [sn.text for sn in tree.findall('.//EventData//Data[@Name="ServiceName"]',namespaces=ns)]

Output of your print statements, given your sample xml, should be:

EVENT: 4771
TimeCreated: 2022-01-01T00:00:00.000000Z
TargetUserName: user
ServiceName: service/dom

Upvotes: 1

Related Questions