Converting complex XML file to Pandas dataframe/CSV - Python

Question

I'm currently in the middle of converting a complex XML file to csv or pandas df. I have zero experience with xml data format and all the code suggestions I found online are just not working for me. Can anyone kindly help me with this?

There are lots of elements in the data that I do not need so I won't include those here.

For privacy reasons I won't be uploading the original data here but I'll be sharing what the structure looks like.

I would be needing everything in the tag.

Ideally I want the headers and output to look like this:

I would sincerely appreciate any help I can get on this. Thanks a mil.

Jack Fleeting · Accepted Answer

Another way to do it, using lxml and xpath:

   from lxml import etree
   dat = """[your FIXED xml]"""
   doc = etree.fromstring(dat)
   columns = []
   rows = []
   to_delete = ["TradeDetails",'Attributes']
   body = doc.xpath('.//RefData')
   for el in body[0].xpath('.//*'):
      columns.append(el.tag)

   for b in body:    
        items = b.xpath('.//*')
        row = []
        for item in items:
           if item.tag not in to_delete:
               row.append(item.text)
        rows.append(row)
   for col in to_delete:
      if col in columns:
         columns.remove(col)

    pd.DataFrame(rows,columns=columns)

Output is the dataframe indicated in your question.

Converting complex XML file to Pandas dataframe/CSV - Python

Answers (2)

Related Questions