Reputation: 2047
I am trying to use ODIA LANGUAGE for a project. When I encode an Odia string, and then try to decode the same, there is error.
b = "କାହିଁକି ଏଇଠି ଅଛୁ "
x = b.encode()
print(x)
m = x.decode()
print(m)
Then, the corresponding output is :
b'\xe0\xac\x95\xe0\xac\xbe\xe0\xac\xb9\xe0\xac\xbf\xe0\xac\x81\xe0\xac\x95\xe0\xac\xbf \xe0\xac\x8f\xe0\xac\x87\xe0\xac\xa0\xe0\xac\xbf \xe0\xac\x85\xe0\xac\x9b\xe0\xad\x81 '
Traceback (most recent call last):
File "x:\Pythonxx36\Egod\expeppp.py", line 9, in <module>
print(m)
File "C:\ProgramData\Miniconda3\envs\pygpu\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character maps to <undefined>
I did not mention any encoding
because I am not sure if utf-8
, utf-7
or utf-32
can encode Odia langauge.
But here, codecs goes directly to cp1252.py
which should not have any relation here/(I am not sure if) \
So my questions are...
encoded text
gives error during decoding
? cp1252.py
?ODIA language
?\Question 1 and 2 are most important, 3 is optional/
Upvotes: 0
Views: 213
Reputation: 177971
cp1252
is the default encoding for your terminal. Older versions of Python automatically encode Unicode strings to the terminal default encoding. You don't need to explicitly encode/decode, but you do need to use a terminal/IDE that supports the encoding you need for the characters being used. UTF-8 is the usual choice since it can handle all Unicode characters.
On Windows, Python versions 3.6 and greater handle Unicode better. The terminal encoding is ignored and Windows Unicode console APIs are used to write directly to the terminal window. You'll need a terminal font that supports the language to see the characters, or use an IDE that supports UTF-8:
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b = "କାହିଁକି ଏଇଠି ଅଛୁ "
>>> print(b)
କାହିଁକି ଏଇଠି ଅଛୁ
To write to a file, the default encoding is the value returned by locale.getpreferredencoding(False)
, which is going to be cp1252
for your system. Specify the encoding instead. UTF-8 works for all Unicode code points. For Python 3, use the following:
with open('out.txt','w',encoding='utf8') as f:
f.write("କାହିଁକି ଏଇଠି ଅଛୁ ")
Use io.open
in Python 2, which is compatible with the same syntax.
Always specify the encoding when reading or writing a file so code doesn't have to rely on a default that can change between different localized OS versions.
Many Windows applications assume the default encoding when reading a file instead of UTF-8, so you may want to use 'utf-8-sig'
as the encoding to write a signature at the beginning of the file that Windows apps (ex: Excel) will recognize and use UTF-8 instead.
Upvotes: 1
Reputation: 375814
Your error is not during decoding. It's when you try to print. m
is a Unicode string, successfully decoded from x
. But when printing, Python tries to encode the string again to the encoding needed by your terminal. That encoding is cp1252, a Windows one-byte encoding. That encoding cannot handle Odia, so it fails.
For question 3, you cannot easily create a new encoding. You need to set your terminal to use an encoding that can handle Odia, like UTF8.
Upvotes: 3