Reputation: 1095
My test.txt file contains these characters:
地藏菩萨本愿经卷上
忉利天宫神通品第一
I have this simple program:
f = open("test.txt")
text = f.read()
f.close()
print text
for c in text:
print c,
print "\n------------"
for i in range(len(text)):
print text[i],
Here is the result:
地藏菩萨本愿经卷上
忉利天宫神通品第一
------------
å œ ° è — マ è マ © è ミ ¨ æ œ ¬ æ „ ¿ ç » マ å ヘ · ä ¸ Š
å ¿ ‰ å ˆ © å ¤ © å ® « ç ¥ ž é € š å “ チ ç ¬ ¬ ä ¸ €
å œ ° è — マ è マ © è ミ ¨ æ œ ¬ æ „ ¿ ç » マ å ヘ · ä ¸ Š
å ¿ ‰ å ˆ © å ¤ © å ® « ç ¥ ž é € š å “ チ ç ¬ ¬ ä ¸ €
"text" gets printed out OK if I use "Print text". But both methods trying to print character by character failed.
What's happening?
Upvotes: 3
Views: 109
Reputation: 250871
You need to decode the data read from the file to utf-8 first:
>>> with open('abc1') as f:
text = f.read().decode('utf-8')
...
>>> print text
地藏菩萨本愿经卷上 忉利天宫神通品第一
>>> for x in text:
print x,
...
地 藏 菩 萨 本 愿 经 卷 上 忉 利 天 宫 神 通 品 第 一
Or use io.open
to open the file with required encoding:
>>> import io
>>> with io.open('abc1', encoding='utf-8') as f:
text = f.read()
>>> for x in text:
print x,
...
地 藏 菩 萨 本 愿 经 卷 上 忉 利 天 宫 神 通 品 第 一
Upvotes: 4