Reputation: 16219
With:
# -*- coding: utf-8 -*-
at the top of my .ipynb, Jupyter is now displaying accented characters correctly.
When I export to csv (with .to_csv()
) a pandas
data frame containing accented characters:
... the characters do not render properly when the csv is opened in Excel.
This is the case whether I set the encoding='utf-8'
or not. Is pandas/python doing all that it can here, and this is an Excel issue? Or can something be done before the export to csv?
Upvotes: 8
Views: 10405
Reputation: 3294
I encountered a similar issue when attempting to read_json
followed by a to_excel
:
df = pandas.read_json(myfilepath)
# causes garbled characters
df.to_excel(sheetpath, encoding='utf8')
# also causes garbled characters
df.to_excel(sheetpath, encoding='latin1')
Turns out, if I load the json manually with the json
module first, and then export with to_excel
, the issue doesn't occur:
with open(myfilepath, encoding='utf8') as f:
j = json.load(f)
df = pandas.DataFrame(j)
df.to_excel(sheetpath, encoding='utf8')
Upvotes: 1
Reputation: 61
I had the same problem, and writing to .xlsx and renaming to .csv didn't solve the problem (for application-specific reasons I won't go into here), nor was I able to successfully use an alternate encoding as Juliana Rivera recommended. 'Manually' writing the data as text worked for me.
with open(RESULT_FP + '.csv', 'w+') as rf:
for row in output:
row = ','.join(list(map(str, row))) + '\n'
rf.write(row)
Sometimes I guess you just have to go back to basics.
Upvotes: 1
Reputation: 31
I also had the same inconvenience. When I checked the Dataframe in the Jupyter notebook I saw that everything was in order.
The problem happens when I try to open the file directly (as it has a .csv extension Excel can open it directly).
The solution for me was to open a new blank excel workbook, and import the file from the "Data" tab, like this:
Then i just choose the right delimiter, and that was it for me.
Upvotes: 3
Reputation: 8064
I had similar problem, also on a Mac. I noticed that the unicode string showed up fine when I opened the csv in TextEdit, but showed up garbled when I opened in Excel.
Thus, I don't think there is any way successfully export unicode to Excel with to_csv
, but I'd expect the default to_excel
writer to suffice.
df.to_excel('file.xlsx', encoding='utf-8')
Upvotes: 3
Reputation: 908
I think using a different excel writer helps, recommending xlsxwriter
import pandas as pd
df = ...
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
df.to_excel(writer)
writer.save()
Upvotes: 2
Reputation: 1083
If you want to keep accents, try with encoding='iso-8859-1'
df.to_csv(path,encoding='iso-8859-1',sep=';')
Upvotes: 9
Reputation: 341
Maybe try this function for your columns if you can't get Excel to cooperate. It will remove the accents using the unicodedata
library:
import unicodedata
def remove_accents(input_str):
if type(input_str) == unicode:
nfkd_form = unicodedata.normalize('NFKD', input_str)
return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
else:
return input_str
Upvotes: 1