Reputation: 1344
I am trying to write something in Dutch to a CSV a file and this is what happens
In the following program, ideally, "Eéntalige affiche in Halle !!" should be written in the csv file. However, it's writing "Eéntalige affiche in Halle !!"
# -*- encoding: utf-8 -*-
import csv
S="Eéntalige affiche in Halle !!".encode("utf-8")
file=c = csv.writer(open("Test.csv","wb"))
file.writerow([S])
In the CSV file== ? "Eéntalige affiche in Halle !!"
Upvotes: 3
Views: 6225
Reputation: 1122412
You are writing data correctly. The problem lies in whatever is reading the data; it is interpreting the UTF-8 data as Latin 1 instead:
>>> print('E\xe9ntalige affiche in Halle !!')
Eéntalige affiche in Halle !!
>>> 'E\xe9ntalige affiche in Halle !!'.encode('utf8')
b'E\xc3\xa9ntalige affiche in Halle !!'
>>> print('E\xe9ntalige affiche in Halle !!'.encode('utf8').decode('latin1'))
Eéntalige affiche in Halle !!
The U+00E9 codepoint (é, LATIN SMALL LETTER E WITH ACUTE) is encoded to two bytes in UTF-8, C3 and A9 in hex. If you treat those two bytes as Latin1 instead, where each character is always only one byte, you get Ã
and ©
instead.
There is no standard for how to treat CSV files and encoding, you'll need to adjust your encoding to the intended target application to read this information. Microsoft Excel reads CSV files according to the current codepage, for example.
If your CSV reader is expecting Latin 1, by all means, encode to Latin 1 instead.
Upvotes: 3