CODEWITHSUNDEEP

xmlpandaselementtree

lil-wolf

Reputation: 382

XML to Pandas Dataframe conversion

XML File :

<start>
    <Hit>
         <hits path="xxxxx" id="xx" title="xxx">
         <hits path="aaaaa" id="aa" title="aaa">
    </Hit>
    <Hit>
         <hits path="bbbbb" id="bb" title="bbb">
    </Hit>
    <Hit>
         <hits path="qqqqq" id="qq" title="qqq">
         <hits path="wwwww" id="ww" title="www">
         <hits path="ttttt" id="tt" title="ttt">
    </Hit>
</start>

Python code :

import xml.etree.cElementTree as et
tree = et.parse(xml_data)
root = tree.getroot()

for child in root:
    record = child.attrib.values()
    all_records.append(record)
    pd1 = pd.DataFrame(all_records,columns=subchild.attrib.keys())

I have unstructed XML file. Hit element can have random number of sub hits elements.
I want to make a list of all the first hits sub element from all Hit element.

Answer :
Dataframe content :

   path    id    title
0  xxxxx   xx    xxx
1  bbbbb   bb    bbb
2  qqqqq   qq    qqq

That's it. All the other items should be ignored.

record = child.attrib.values()

This line of code is taking all the values form hits element. i.e. total 6 values. I want only 3 values as only 3 Hit tag is available.

How to do it?

Upvotes: 1

Views: 2392

Answers (1)

jezrael

Reputation: 862661

I think need change:

record = child.attrib.values()

to:

record = child[0].attrib.values()

for select only first values.

List comprehesnion solution:

all_records = [child[0].attrib.values() for child in root ]

If possible some empty Hit elements:

all_records = []
for child in root:
    if len(child) > 0:
        record = child[0].attrib.values()
        all_records.append(record)

List comprehension solution:

all_records = [child[0].attrib.values() for child in root if len(child) > 0]

Upvotes: 2

Related Questions