python2.7 - reading a dictionary from a .txt file riddled with unicode

Question

I enrolled into a Chinese Studies course some time ago, and I thought it'd be a great exercise for me to write a flashcard program in python. I'm storing the flash card lists in a dictionary in a .txt file, so far without trouble. The real problems kick in when I try to load the file, encoded in utf-8, into my program. An excerpt of my code:

import codecs

f = codecs.open(('list.txt'),'r','utf-8')
quiz_list = eval(f.read())

quizy = str(quiz_list).encode('utf-8')

print quizy

Now, if for example list.txt consists of:

{'character1':'男人'}

what is printed is actually

{'character1': '\xe7\x94\xb7\xe7\x86\xb1'}

Obviously there are some serious encoding issues here, but I cannot for the life of me understand where these occur. I am working with a terminal which supports utf-8, so not the standard cmd.exe: this is not the problem. Reading a normal list.txt without the curly dict-bits returns the chinese characters without a problem, so my guess is I'm not handling the dictionary part correctly. Any thoughts would be greatly appreciated!

mac · Accepted Answer

There's nothing wrong with your encoding... Look at this:

>>> d = {1:'男人'}
>>> d[1]
'\xe7\x94\xb7\xe4\xba\xba'
>>> print d[1]
男人

One thing is to print a unicode string another one is printing its representation.

python2.7 - reading a dictionary from a .txt file riddled with unicode

Answers (2)

Related Questions