Nihal Sharma
Nihal Sharma

Reputation: 2437

How to get a single unicode character from its integer representation?

I did not want to post this question but tried almost all the things, nothing seems to work. On python 2.7

ord(unicode('₹', "utf-8"))

This produces 8377 as the output. How do I get '₹' from 8377?

unichr(8377) and chr(8377) do not work as they throw ordinal not in range(128) exception. I tried other things as well but I think I am headed in a wrong direction.

Upvotes: 4

Views: 678

Answers (1)

Eric Duminil
Eric Duminil

Reputation: 54263

Problem

According to the documentation :

>>> unichr(8377)
u'\u20b9'

This should work on any python 2.7 on any system.

It does exactly what you asked : it returns a single unicode character from its integer representation. This unicode character isn't displayed as , though. Instead, a repr version is returned, which can be displayed with ascii characters.

Depending on your terminal, print will either display the character correctly :

Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unichr(8377)
u'\u20b9'
>>> print unichr(8377)
₹

or throw an error (powershell on Windows) :

PS C:\Windows\System32\WindowsPowerShell\v1.0> python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (
Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print unichr(8377)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files (x86)\Python2.7\lib\encodings\cp850.py", line 12, in en
code
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u20b9' in position
 0: character maps to <undefined>
>>>

Possible solution

Your terminal needs to accept unicode characters.

This answer might help you :

import locale
print unichr(8377).encode(locale.getdefaultlocale()[1], 'replace')

Depending on your encoding, the character might be displayed correctly or as a ?.

This character substitution is called "tofu" or "mojibake", and it isn't a Python problem. It's related to the underlying terminal (e.g. Powershell).

Those threads might help you.

Upvotes: 4

Related Questions