Reputation: 143
I am trying to convert an XML file to CSV, but the encoding of the XML ("ISO-8859-1") apparently contains characters that are not in the ascii codec which Python uses to write rows.
I get the error:
Traceback (most recent call last):
File "convert_folder_to_csv_PLAYER.py", line 139, in <module>
xml2csv_PLAYER(filename)
File "convert_folder_to_csv_PLAYER.py", line 121, in xml2csv_PLAYER
fout.writerow(row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)
I have tried opening the file as follows:
dom1 = parse(input_filename.encode( "utf-8" ) )
and I have tried replacing the \xe1 character in each row before it is written. Any suggestions?
Upvotes: 0
Views: 2501
Reputation: 223062
The xml parser returns unicode
objects. That's a good thing. Thing is, csv
module can't deal with them.
You could encode each unicode
string returned by the xml parser before handing to the csv
writer, but a better idea is to use this csv UnicodeWriter
recipe from the official docs of the csv
module:
import csv, codecs, cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
Upvotes: 1