Reputation: 3642
Trying to parse the xml and then reprsent them as Pandas dataframe
<?xml version="1.0"?><results>
<header>
<cloc_url>github.com/AlDanial/cloc</cloc_url>
<cloc_version>1.74</cloc_version>
<elapsed_seconds>0.940369129180908</elapsed_seconds>
<n_files>124</n_files>
<n_lines>8440</n_lines>
<files_per_second>131.863112209998</files_per_second>
<lines_per_second>8975.19892784178</lines_per_second>
<report_file>/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml</report_file>
</header>
<files>
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-parent/pom.xml" blank="13" comment="23" code="491" language="Maven" />
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/test/resources/assets.json" blank="0" comment="0" code="357" language="JSON" />
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/main/java/com/chute/aem/core/api/impl/UserServiceImpl.java" blank="26" comment="21" code="202" language="Java" />
output something like
file name blank comment language code
Repo/ignite-chute-aem/aem-parent/pom.xml" "13" "23" Maven 491
<fullpath>/assets.json" "12" "3" c 432
I was just able to do few lines
import pandas as pd
from xml.etree import ElementTree
tree = ElementTree.parse('/Users/hariomsingh/Desktop/individualxml/ignite-chute-aem_cloc.xml')
root = tree.getroot()
print(root)
print(tree.iter())
csv_data = []
fields = ['file name','blank','comment', 'language', 'code']
Upvotes: 0
Views: 67
Reputation: 739
Assuming you're ok with installing beautifulsoup4 (i.e., pip3 install beautifulsoup4
) as well as pandas (i.e., pip3 install pandas
), then this should do the trick:
from bs4 import BeautifulSoup as Soup
import pandas
xml = """
<?xml version="1.0"?><results>
<header>
<cloc_url>github.com/AlDanial/cloc</cloc_url>
<cloc_version>1.74</cloc_version>
<elapsed_seconds>0.940369129180908</elapsed_seconds>
<n_files>124</n_files>
<n_lines>8440</n_lines>
<files_per_second>131.863112209998</files_per_second>
<lines_per_second>8975.19892784178</lines_per_second>
<report_file>/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml</report_file>
</header>
<files>
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-parent/pom.xml" blank="13" comment="23" code="491" language="Maven" />
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/test/resources/assets.json" blank="0" comment="0" code="357" language="JSON" />
<file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/main/java/com/chute/aem/core/api/impl/UserServiceImpl.java" blank="26" comment="21" code="202" language="Java" />
"""
soup = Soup(xml, 'lxml')
records = []
for file in soup.findAll('file'):
records.append(file.attrs)
data_table = pandas.DataFrame(records)
# this prints the table without the long file name to ease seeing all other fields
print(data_table.drop('name', axis=1))
# this prints just the names (or at least the bit that pandas prints by default)
print(data_table['name'])
# saving them to disk so you can see the entire table in excel or similar
data_table.to_csv('output.csv', index=False)
Upvotes: 1