Reputation: 121
My code is about getting names of some users and some points about them and write them into a text file as table. For example:
19(someSpace*)|حمید
20(someSpace*)|وحید
70(someSpace*)|خلیل
14(someSpace*)|Hamid
def roww(STR, n):
if n >= 10:
return str(n) +4*" " +"| " + STR + "\n"
else:
return str(n) +5*" " +"|" + STR + "\n"
def my_table(STR, m):
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import codecs as D
f = D.open(STR + '.txt', "w", encoding='utf-8')
i = 0
while(i < m):
i +=1
a = raw_input("Name: ").encode('utf-8')
b = raw_input("Grade: ")
f.write(roww(a,b))
f.close()
when execute:
my_table("grade",3)
Name: حمید
I get this error:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-10-a48ceb393d9a> in <module>()
----> 1 my_table("grade",3)
<ipython-input-9-6a83996822a3> in my_table(STR, m)
14 while(i < m):
15 i +=1
---> 16 a = raw_input("Name: ").encode('utf-8')
17 b = raw_input("Grade")
18 f.write(roww(a,b))
C:\Users\Hamid\Anaconda2\lib\encodings\utf_8.pyc in decode(input, errors)
14
15 def decode(input, errors='strict'):
---> 16 return codecs.utf_8_decode(input, errors, True)
17
18 class IncrementalEncoder(codecs.IncrementalEncoder):
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 0: invalid continuation byte
I can't solve my problem with python about utf-8. Also I can't find any useful answer.
Upvotes: 1
Views: 3819
Reputation: 25154
Here is a simple example, which demonstrates how you would read
/ decode
and write an arabic text file. As Ilja already pointed out, it basically depends on your terminal, and as you are getting from the terminal an already utf-8
encoded bytecode, you actually have to decode it.
Works fine on MacOSx:
If you run this snippet and give as input المدرالمدرالمدر
it will work fine.
# -*- coding: utf-8 -*-
x = raw_input("test: ").decode("utf-8")
print x
f = open("testarabic.txt", "w")
f.write(x.encode("utf-8"))
f.close()
The first line# -*- coding: utf-8 -*-
is not needed in your case, unless you have some static text in your python file.
Workaround on Windows
I've just gave it a try on windows and as you said, i am getting as well ?????
characters, but i realized that the windows command line is using a different encoding. So the first question is, if your command line already shows arabic characters correctly, than figure out what codepage
it is using by typing in terminal
1) Get Code Page of your Terminal
chcp
Active code page:1256
2) Use the active code you got in your terminal to decode your raw-input
x = raw_input("test: ").decode("1256")
If your command line on windows wasn't showing arabic signs correctly you can set it by typing in the windows command line
chcp 1256
Upvotes: 1