Reputation: 1212
I am having an XML data which also contains HTML data. I'm trying to dump this XML data to one cell in a csv file which also contains other columns. Right now, it is splitting itself and coming in different(adjacent) cells. Therefore reading the csv using pandas throws an error
Error tokenizing data. C error: Expected 94 fields in line 3, saw 221
I also looked into a similar scenario. But it didn't help because it was from a database. Therefore the workaround functionalities will be different.
I am not looking to parse the XML data. I just want to save the entire XML data into one cell in a csv file.
Moreover, I cannot share the data snapshot for confidentiality reasons but I hope the issue is conveyed.
Any help is appreciated.
Upvotes: 2
Views: 247
Reputation: 177
you can use built in csv package, try wrapping the xml as a string inside of a list:
import csv
xml = ["""<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
</catalog>"""]
with open("test.csv", "w", encoding="utf8") as out_file:
writer = csv.writer(out_file)
writer.writerow(xml)
You should then be able to read it with pandas.
Upvotes: 2
Reputation: 2718
import pandas as pd
with open('note.xml', 'r') as f:
data = f.read()
df = pd.DataFrame(data = {'xml_file': [data]})
df.to_csv('xml_as_csv.csv')
Upvotes: 1