Parse xml using panda

Question

Trying to parse the xml and then reprsent them as Pandas dataframe



  github.com/AlDanial/cloc
  1.74
  0.940369129180908
  124
  8440
  131.863112209998
  8975.19892784178
  /Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml

output something like

file name                                 blank  comment language code
Repo/ignite-chute-aem/aem-parent/pom.xml"  "13"   "23"     Maven   491
/assets.json"                     "12"   "3"      c       432

I was just able to do few lines

import pandas as pd
from xml.etree import ElementTree
tree = ElementTree.parse('/Users/hariomsingh/Desktop/individualxml/ignite-chute-aem_cloc.xml')
root = tree.getroot()

print(root)
print(tree.iter())

csv_data = []
fields =  ['file name','blank','comment', 'language', 'code']

TSeymour · Accepted Answer

Assuming you're ok with installing beautifulsoup4 (i.e., pip3 install beautifulsoup4) as well as pandas (i.e., pip3 install pandas), then this should do the trick:

from bs4 import BeautifulSoup as Soup
import pandas

xml = """


  github.com/AlDanial/cloc
  1.74
  0.940369129180908
  124
  8440
  131.863112209998
  8975.19892784178
  /Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml


  
  
  
"""

soup = Soup(xml, 'lxml')

records = []

for file in soup.findAll('file'):
    records.append(file.attrs)

data_table = pandas.DataFrame(records)

# this prints the table without the long file name to ease seeing all other fields
print(data_table.drop('name', axis=1))

# this prints just the names (or at least the bit that pandas prints by default)
print(data_table['name'])

# saving them to disk so you can see the entire table in excel or similar
data_table.to_csv('output.csv', index=False)

Parse xml using panda

Answers (1)

Related Questions