user1058744
user1058744

Reputation: 55

python2.7 - reading a dictionary from a .txt file riddled with unicode

I enrolled into a Chinese Studies course some time ago, and I thought it'd be a great exercise for me to write a flashcard program in python. I'm storing the flash card lists in a dictionary in a .txt file, so far without trouble. The real problems kick in when I try to load the file, encoded in utf-8, into my program. An excerpt of my code:

import codecs

f = codecs.open(('list.txt'),'r','utf-8')
quiz_list = eval(f.read())

quizy = str(quiz_list).encode('utf-8')

print quizy

Now, if for example list.txt consists of:

{'character1':'男人'}

what is printed is actually

{'character1': '\xe7\x94\xb7\xe7\x86\xb1'}

Obviously there are some serious encoding issues here, but I cannot for the life of me understand where these occur. I am working with a terminal which supports utf-8, so not the standard cmd.exe: this is not the problem. Reading a normal list.txt without the curly dict-bits returns the chinese characters without a problem, so my guess is I'm not handling the dictionary part correctly. Any thoughts would be greatly appreciated!

Upvotes: 3

Views: 351

Answers (2)

mac
mac

Reputation: 43061

There's nothing wrong with your encoding... Look at this:

>>> d = {1:'男人'}
>>> d[1]
'\xe7\x94\xb7\xe4\xba\xba'
>>> print d[1]
男人

One thing is to print a unicode string another one is printing its representation.

Upvotes: 3

ephemient
ephemient

Reputation: 204926

str(quizy) calls repr(quizy['character1']) which produces an ASCII representation of the string value. If you just print quizy['character1'] you'll see that the character codes are Unicode in the Python string.

Upvotes: 2

Related Questions