Reputation: 382
XML File :
<start>
<Hit>
<hits path="xxxxx" id="xx" title="xxx">
<hits path="aaaaa" id="aa" title="aaa">
</Hit>
<Hit>
<hits path="bbbbb" id="bb" title="bbb">
</Hit>
<Hit>
<hits path="qqqqq" id="qq" title="qqq">
<hits path="wwwww" id="ww" title="www">
<hits path="ttttt" id="tt" title="ttt">
</Hit>
</start>
Python code :
import xml.etree.cElementTree as et
tree = et.parse(xml_data)
root = tree.getroot()
for child in root:
record = child.attrib.values()
all_records.append(record)
pd1 = pd.DataFrame(all_records,columns=subchild.attrib.keys())
I have unstructed XML file. Hit element can have random number of sub hits elements.
I want to make a list of all the first hits sub element from all Hit element.
Answer :
Dataframe content :
path id title
0 xxxxx xx xxx
1 bbbbb bb bbb
2 qqqqq qq qqq
That's it. All the other items should be ignored.
record = child.attrib.values()
This line of code is taking all the values form hits element. i.e. total 6 values. I want only 3 values as only 3 Hit tag is available.
How to do it?
Upvotes: 1
Views: 2392
Reputation: 862661
I think need change:
record = child.attrib.values()
to:
record = child[0].attrib.values()
for select only first values.
List comprehesnion solution:
all_records = [child[0].attrib.values() for child in root ]
If possible some empty Hit
elements:
all_records = []
for child in root:
if len(child) > 0:
record = child[0].attrib.values()
all_records.append(record)
List comprehension solution:
all_records = [child[0].attrib.values() for child in root if len(child) > 0]
Upvotes: 2