CultureQuant
CultureQuant

Reputation: 227

problems writing a pandas DataFrame into a unicode text file

I have written a program to generate a unicode text file to upload into a website. I have successfully prototyped this particular file using Microsoft Access and have noted that the website rejects the file as not a unicode text file if it is encoded as a utf-8 file. From the text editor (using Windows Notepad), if I save the file using just the unicode encoding in the drop-down box, the website I'm uploading into accepts the file just fine.

So, with this context in mind, I've written a program to autogenerate the file using pandas and DataFrames. The last line of my program exports the DataFrame to a text file:

 coa1.to_csv('0000-2951-test.txt',index=False, sep='\t', encoding = 'utf-8')

This generates the right file but when I open it in Notepad, the encoding listed is ANSI and not unicode. How do I write my DataFrame to a unicode file? And what kind of unicode does Notepad mean by Unicode (when it doesn't have any additional qualifiers)?

Upvotes: 3

Views: 2786

Answers (1)

bpgergo
bpgergo

Reputation: 16037

"unicode" is not one particular encoding but a set of encodings. It can be utf-8, utf-16, utf-32. more detailed explanation

in Windows, unicode often means utf-16, and it is possible that this website expects utf-16. try to encode your csv in utf-16 and see if it is "unicode" in Notepad and this website accepts it.

coa1.to_csv('0000-2951-test.txt',index=False, sep='\t', encoding = 'utf-16')

EDIT Instead of using Windows Notepad, I suggest to use a more decent text editor for your prototyping, which enables to precisely set the encoding. I would use Sublime or Notepad++

Upvotes: 5

Related Questions