Edward Coom
Edward Coom

Reputation: 37

Parsing XML to CSV with Python

i'd like to parse an XML file into a CSV format and display in a format like:

enter image description here

I have succesfully located each element's text within the csv file, id like to match up the namelink and descriptions into rows and have the text from each element in each column as can be seen in the table.

The original XML File: enter image description here

My current attempt:

# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd

# Parsing the XML file
xmlparse = Xet.parse('NiktoReportTest.xml')
root = xmlparse.getroot()

cols = ["namelink", "description"]
rows = []


x = []
for elm in root.findall("./niktoscan/scandetails/item/namelink"):
    x.append(elm.text)

y = []
for value in root.findall("./niktoscan/scandetails/item/description"):
    y.append(value.text)


rows.append({"namelink": x,
             "description": y})

df = pd.DataFrame(rows, columns=cols)

# Writing dataframe to csv
df.to_csv('output.csv')

The current output of the CSV file:

,namelink,description
0,"['http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80//', 'http://127.0.0.1:80//', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80///etc/hosts', 'http://127.0.0.1:80///', 'http://127.0.0.1:80/server-status', 'http://127.0.0.1:80/?PageServices', 'http://127.0.0.1:80/?wp-cs-dump', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/assets/mobirise/css/meta.php?filesrc=', 'http://127.0.0.1:80/login.cgi?cli=aa%20aa%27cat%20/etc/hosts', 'http://127.0.0.1:80/shell?cat+/etc/hosts']","['The anti-clickjacking X-Frame-Options header is not present.', 'The X-XSS-Protection header is not defined. This header can hint to the user agent to protect against some forms of XSS', 'The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type', '/: Directory indexing found.', 'Allowed HTTP Methods: POST, OPTIONS, HEAD, GET ', '/./: Directory indexing found.', ""/./: Appending '/./' to a directory allows indexing"", '//: Directory indexing found.', '//: Apache on Red Hat Linux release 9 reveals the root directory listing by default if there is no index page.', '/%2e/: Directory indexing found.', '/%2e/: Weblogic allows source code or directory listing, upgrade to v6.0 SP1 or higher. BID-2513.', ""///etc/hosts: The server install allows reading of any system file by adding an extra '/' to the URL."", '///: Directory indexing found.', '/server-status: This reveals Apache information. Comment out appropriate line in the Apache conf file or restrict access to allowed sources.', ""/?PageServices: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", ""/?wp-cs-dump: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", '///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Directory indexing found.', ""///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Abyss 1.03 reveals directory listing when \t /'s are requested."", '/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/assets/mobirise/css/meta.php?filesrc=: A PHP backdoor file manager was found.', '/login.cgi?cli=aa%20aa%27cat%20/etc/hosts: Some D-Link router remote command execution.', '/shell?cat+/etc/hosts: A backdoor was identified.']"

Upvotes: 1

Views: 1482

Answers (1)

Rezart Qelibari
Rezart Qelibari

Reputation: 94

I modified your code to write to a CSV file. There is no need for panda if you use it only to write to a CSV file.

import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
xmlparse = Xet.parse('test.xml')
root = xmlparse.getroot()

column_names = ["namelink", "description"]
column_values = {}

# Extract column data for all columns defined above
for column_name in column_names:
    column_values[column_name] = []
    for element in root.findall(f'./niktoscan/scandetails/item/{column_name}'):
        column_values[column_name].append(element.text)

# Create a row item for every column value, that was extracted above
rows = zip(*column_values.values())


with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(column_names)
    writer.writerows(rows)

Upvotes: 1

Related Questions