Reputation: 5342
While writing strings containing certain special characters, such as
Töölönlahdenkatu
using to_csv from pandas, the result in the csv looks like
T%C3%B6%C3%B6l%C3%B6nlahdenkatu
How do we get to write the text of string as it is? This is my to_csv command
df.to_csv(csv_path,index=False,encoding='utf8')
I have even tried
df.to_csv(csv_path,index=False,encoding='utf-8')
df.to_csv(csv_path,index=False,encoding='utf-8-sig')
and still no success.There are other characters replaced with random symbols
'-' to –
Is there a workaround?
Upvotes: 0
Views: 463
Reputation: 71
Special characters like ö
cannot be stored in a csv the same way english letters can. The "random symbols" tell a program like excel to interpret the letters as special characters when you open the file, but special characters cannot be seen when you view the csv in vscode (for instance).
Upvotes: 0
Reputation: 1265
What you're trying to do is remove German umlauts and Spanish tildes. There is an easy solution for that.
import unicodedata
data = u'Töölönlahdenkatu Adiós Pequeño'
english = unicodedata.normalize('NFKD', data).encode('ASCII', 'ignore')
print(english)
output : b'Toolonlahdenkatu Adios Pequeno'
Let me know if it works or if there are any edge cases.
Upvotes: 1