Reputation: 664
I have an XML and part of it looks like this:
<?xml version="1.0" encoding="UTF-8" ?>,
<Settings>,
<System>,
<Format>Percent</Format>,
<Time>12 Hour Format</Time>,
<Set>Standard</Set>,
</System>,
<System>,
<Format>Percent</Format>,
<Time>12 Hour Format</Time>,
<Set>Standard</Set>,
<Alarm>ON</Alarm>,
<Haptic>ON</Haptic>'
</System>
</Settings>
What I would like to do is use xpath to specify the path //Settings/System
and get the tags and values in system so that I can populate a dataframe with the following output:
| Format | Time| Set| Alarm| Haptic|
|:_______|:____|:___|______|_______|
| Percent| 12 Hour Format| Standard| NaN| NaN|
| Percent| 12 Hour Format| Standard| ON| ON|
So far I have seen methods as follows:
import xml.etree.ElementTree as ET
root = ET.parse(filename)
result = ''
for elem in root.findall('.//child/grandchild'):
# How to make decisions based on attributes even in 2.6:
if elem.attrib.get('name') == 'foo':
result = elem.text
These methods explicitly mention elem.attrib.get('name')
which I would not be able to use in my case because of inconsistent elements within my /System
tag. So what I am asking is if there is a method to use xpath (or anything else) which I can specify /System
and get all elements and their values?
Upvotes: 1
Views: 124
Reputation: 24930
Your xml is still not well formed, but assuming it's fixed and looks like the version before, the following should work:
#fixed xml
<?xml version="1.0" encoding="UTF-8" ?>
<Settings>
<System>
<Format>Percent</Format>
<Time>12 Hour Format</Time>
<Set>Standard</Set>
</System>
<System>
<Format>Percent</Format>
<Time>12 Hour Format</Time>
<Set>Standard</Set>
<Alarm>ON</Alarm>
<Haptic>ON</Haptic>
</System>
</Settings>
Now for the code itself:
import pandas as pd
rows, tags = [], []
#get all unique element names
for elem in root.findall('System//*'):
if elem.tag not in tags:
tags.append(elem.tag)
#now collect the required info:
for elem in root.findall('System'):
rows.append([elem.find(tag).text if elem.find(tag) is not None else None for tag in tags ])
pd.DataFrame(rows,columns=tags)
Output:
Format Time Set Alarm Haptic
0 Percent 12 Hour Format Standard None None
1 Percent 12 Hour Format Standard ON ON
Upvotes: 1