Printing a dataframe to CSV file keeping the non-latin characters

Question

I have a data frame with almost 55000 rows on Python. Some cells include non-latin characters and when I use df.to_csv('./df.csv'), they are printed as different characters.

For instance, とある魔術の禁書目録 3 (Toaru Majutsu no Index, #3) is printed as ã¨ã‚ã‚‹é”è¡“ã®ç¦æ›¸ç›®éŒ² 3 (Toaru Majutsu no Index, #3) in the CSV file.

How can I preserve the original spellings in the CSV file?

Hubert Dudek · Accepted Answer

try one of these:

df.to_csv('./df.csv', encoding='utf-8-sig')
df.to_csv('./df.csv', encoding='utf-16')

utf-8-sig stands for:

This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped.

source: https://docs.python.org/2.5/lib/module-encodings.utf-8-sig.html

Printing a dataframe to CSV file keeping the non-latin characters

Answers (1)

Related Questions