Reputation: 881
I have a data frame with almost 55000 rows on Python. Some cells include non-latin characters and when I use df.to_csv('./df.csv')
, they are printed as different characters.
For instance, とある魔術の禁書目録 3 (Toaru Majutsu no Index, #3)
is printed as ã¨ã‚ã‚‹é”è¡“ã®ç¦æ›¸ç›®éŒ² 3 (Toaru Majutsu no Index, #3)
in the CSV file.
How can I preserve the original spellings in the CSV file?
Upvotes: 2
Views: 861
Reputation: 1730
try one of these:
df.to_csv('./df.csv', encoding='utf-8-sig')
df.to_csv('./df.csv', encoding='utf-16')
utf-8-sig stands for:
This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is only done once (on the first write to the byte stream). For decoding an optional UTF-8 encoded BOM at the start of the data will be skipped.
source: https://docs.python.org/2.5/lib/module-encodings.utf-8-sig.html
Upvotes: 4