Reputation: 4635
I am trying to extract the following values from an xml file:
NAME, Mode,LEVELS,Group,Type
and after I want to make data.frame
. The problem I having so far is that I cannot get <Name>ALICE</Name>
variables and output data.frame format is different than I need.
Here is the some post that I used when I built my read_xml
function
here is the example xml
file format
<?xml version="1.0"?>
<Body>
<DocType>List</DocType>
<DocVersion>1</DocVersion>
<LIST>
<Name>ALICE</Name>
<Variable>
<Mode>Hole</Mode>
<LEVELS>1</LEVELS>
<Group>11</Group>
<Type>0</Type>
<Paint />
</Variable>
<Variable>
<Mode>BEWEL</Mode>
<LEVELS>2</LEVELS>
<Group>22</Group>
<Type>0</Type>
<Paint />
</Variable>
<Name>WONDERLAND</Name>
<Variable>
<Mode>Mole</Mode>
<LEVELS>1</LEVELS>
<Group>11</Group>
<Type>0</Type>
<Paint />
</Variable>
<Variable>
<Mode>Matrix</Mode>
<LEVELS>6</LEVELS>
<Group>66</Group>
<Type>0</Type>
<Paint />
</Variable>
</LIST>
</Body>
I built the following function;
xml_file = r"C:\xml.xml"
def read_xml(xml_file):
etree = ET.parse(xml_file)
root = etree.getroot()
items = []
for item in root.findall('./LIST/'):
values = {}
for it in item:
#print(it)
values[it.tag] = it.text
items.append(values)
columns = ['Name','Mode', 'LEVELS','Group','Type']
df = pd.DataFrame(items, columns = columns)
return df
print(read_xml(xml_file))
giving me this output
Name Mode LEVELS Group Type
0 NaN NaN NaN NaN NaN
1 NaN Hole 1 11 0
2 NaN BEWEL 2 22 0
3 NaN NaN NaN NaN NaN
4 NaN Mole 1 11 0
5 NaN Matrix 6 66 0
the expected output
NAME MODE LEVELS Group Type
1 ALICE Hole 1 11 0
2 ALICE BEWEL 11 22 0
3 WONDERLAND MOLE 1 11 0
4 WONDERLAND MATRIX 6 66 0
How can I get the expected output!!
Thx!
Upvotes: 0
Views: 1399
Reputation: 1
import xml.etree.ElementTree as ET
import pandas as pd
def xml_to_df(xml_file):
tree = ET.parse(xml_file)
root = tree.getroot()
data = []
for child in root:
record = {}
for subchild in child:
record[subchild.tag] = subchild.text
data.append(record)
df = pd.DataFrame(data)
return df
Upvotes: 0
Reputation: 862661
If tag is Name
in loop then set to variable and last add to dictionary
values:
import xml.etree.cElementTree as ET
def read_xml(xml_file):
etree = ET.parse(xml_file)
root = etree.getroot()
items = []
for item in root.findall('LIST/'):
values = {}
if (item.tag == 'Name'):
name = item.text
continue
for it in item:
values[it.tag] = it.text
values['Name'] = name
items.append(values)
columns = ['Name','Mode', 'LEVELS','Group','Type']
df = pd.DataFrame(items, columns = columns)
return df
xml_file = 'xml.xml'
print(read_xml(xml_file))
Name Mode LEVELS Group Type
0 ALICE Hole 1 11 0
1 ALICE BEWEL 2 22 0
2 WONDERLAND Mole 1 11 0
3 WONDERLAND Matrix 6 66 0
Upvotes: 1