Hamid Shafie Asl
Hamid Shafie Asl

Reputation: 121

How to use UTF-8 in python when user input Persian or Arabic

My code is about getting names of some users and some points about them and write them into a text file as table. For example:

19(someSpace*)|حمید

20(someSpace*)|وحید

70(someSpace*)|خلیل

14(someSpace*)|Hamid

def roww(STR, n):
    if n >= 10:
        return str(n) +4*" " +"| " + STR + "\n"
    else:
        return str(n) +5*" " +"|" + STR + "\n"

def my_table(STR, m):
    import sys  
    reload(sys)  
    sys.setdefaultencoding('utf8')
    import codecs as D
    f = D.open(STR + '.txt', "w", encoding='utf-8')
    i = 0
    while(i < m):
        i +=1
        a = raw_input("Name: ").encode('utf-8')
        b = raw_input("Grade: ")
        f.write(roww(a,b))
    f.close()

when execute:

my_table("grade",3)
Name: حمید

I get this error:

    UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-10-a48ceb393d9a> in <module>()
----> 1 my_table("grade",3)

<ipython-input-9-6a83996822a3> in my_table(STR, m)
     14     while(i < m):
     15         i +=1
---> 16         a = raw_input("Name: ").encode('utf-8')
     17         b = raw_input("Grade")
     18         f.write(roww(a,b))

C:\Users\Hamid\Anaconda2\lib\encodings\utf_8.pyc in decode(input, errors)
     14 
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17 
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 0: invalid continuation byte

I can't solve my problem with python about utf-8. Also I can't find any useful answer.

Upvotes: 1

Views: 3819

Answers (1)

user1767754
user1767754

Reputation: 25154

Here is a simple example, which demonstrates how you would read / decode and write an arabic text file. As Ilja already pointed out, it basically depends on your terminal, and as you are getting from the terminal an already utf-8 encoded bytecode, you actually have to decode it.

Works fine on MacOSx:

If you run this snippet and give as input المدرالمدرالمدر it will work fine.

# -*- coding: utf-8 -*-
x = raw_input("test: ").decode("utf-8")
print x
f = open("testarabic.txt", "w")
f.write(x.encode("utf-8"))
f.close()

The first line# -*- coding: utf-8 -*-is not needed in your case, unless you have some static text in your python file.

Workaround on Windows

I've just gave it a try on windows and as you said, i am getting as well ????? characters, but i realized that the windows command line is using a different encoding. So the first question is, if your command line already shows arabic characters correctly, than figure out what codepage it is using by typing in terminal

1) Get Code Page of your Terminal

chcp
Active code page:1256

2) Use the active code you got in your terminal to decode your raw-input

x = raw_input("test: ").decode("1256")

If your command line on windows wasn't showing arabic signs correctly you can set it by typing in the windows command line chcp 1256

Upvotes: 1

Related Questions