Clément
Clément

Reputation: 23

Unicode for an emoji in a string variable isn't shown as the emoji

First of all, sorry for my poor and approximate english...

I'm trying to do a Python script, who should retrieve a variable that will represent the unicode code corresponding to an emoji (U000xxxx). The final goal of this part of the program is to have translated, from unicode, in the name of the emoji.

Since I know that in Python to display an emoji it is print("\U000XXXXX") , so I added the \ before the previous name. But when I print the final rendering is not the one expected

unicode = "U0001f0cf"
unicode = (f"\{unicode}") #OR# unicode = "\%s" %unicode
print (unicode) #>>> \U0001f0cf
#Expected >>> 🃏

I tried a lot of things including .encode() but Python told me I couldn't use a string pattern on an object of type bytes (?)

This is the part that is causing me problem, all the rest of the process is ok... To translate the name of the emoji, from unicode, I found this method (crafted from another Stackoverflow topic)

name = emojis.decode(unicode).replace("_"," ").replace(":","")
print(name) #>>> \U0001f0cf

Whereas if I directly enter the unicode code it works...

name = emojis.decode("U0001f0cf").replace("_"," ").replace(":","")
print(name) #>>> :black_joker:

Thank you very much to anyone who will try to help me, Have a good evening

Upvotes: 2

Views: 2311

Answers (5)

Mark Tolonen
Mark Tolonen

Reputation: 177406

First get the numeric part from the variable, then use chr() to convert it to its Unicode equivalent, then use the unicodedata database to fetch its name:

import unicodedata as ud

u = 'U0001f0cf'
i = int(u[1:],16)
c = chr(i)
n = ud.name(c)
print(c,n)

Output:

🃏 PLAYING CARD BLACK JOKER

You can also use a range loop to display a number of emoji:

import unicodedata as ud

for i in range(0x1f0c1,0x1f0d0):
    c = chr(i)
    n = ud.name(c)
    print(c,n)

Output:

🃁 PLAYING CARD ACE OF DIAMONDS
🃂 PLAYING CARD TWO OF DIAMONDS
🃃 PLAYING CARD THREE OF DIAMONDS
🃄 PLAYING CARD FOUR OF DIAMONDS
🃅 PLAYING CARD FIVE OF DIAMONDS
🃆 PLAYING CARD SIX OF DIAMONDS
🃇 PLAYING CARD SEVEN OF DIAMONDS
🃈 PLAYING CARD EIGHT OF DIAMONDS
🃉 PLAYING CARD NINE OF DIAMONDS
🃊 PLAYING CARD TEN OF DIAMONDS
🃋 PLAYING CARD JACK OF DIAMONDS
🃌 PLAYING CARD KNIGHT OF DIAMONDS
🃍 PLAYING CARD QUEEN OF DIAMONDS
🃎 PLAYING CARD KING OF DIAMONDS
🃏 PLAYING CARD BLACK JOKER

Upvotes: 2

Kesslwovv
Kesslwovv

Reputation: 652

the simple way of getting the unicode character is to include the backslash in the first place:

unicode = "\U0001f0cf"
print (unicode) #>>> 🃏

the other way is more complex and a bit ugly due to the use of eval:

unicode = "U0001f0cf"
unicode = eval(f'"\\{unicode}"')
print(unicode) #>>> 🃏

in this case f'"\\{unicode}"' is evaluated to '"\U0001f0cf"' and the string inside the f-string is evaluated ("\U0001f0cf" to 🃏`).

edit (because of tripleee's comment):

eval is insecure when used with user input because the user can evaluate any code (including os-commands). but as long as you only use the code for yourself, this is not an issue.
alternatives are:

  • ast.literal_eval like in Wombatz answer for safe-evaluating
  • chr like in tripleee's answer, which is a very elegant and fitting solution.

Upvotes: -2

tripleee
tripleee

Reputation: 189317

You are confused about the meaning of the backslash. In Python source code, "\U0001f0cf" encodes a single character in a string. You can't turn the nine-character string "U0001f0cf" into a single character by adding a backslash in front, any more that concatenating a literal backslash in front of "n" turns it into a newline.

What you can do easily is drop the U and convert that hex number into a character via chr().

unicode = "U0001f0cf"
print(chr(int(unicode[1:], 16)))

int("string", base) converts string to a number in the specified base.

Upvotes: 3

Johan Jomy
Johan Jomy

Reputation: 691

unicode = "U0001f0cf"
unicode = (f"\{unicode}")

print(unicode.encode('raw-unicode-escape').decode('unicode-escape'))

This gives you 🃏 instead of \U0001f0cf

Upvotes: 3

Wombatz
Wombatz

Reputation: 5448

You can use ast.literal_eval for this.

We can build a valid string literal containing a unicode escape sequence for python. We just have to add ".

from ast import literal_eval

user_input = 'U0001f0cf'
emoji_literal = f'"\\{user_input}"'
#                 ^              ^
#                here         and here
print(emoji_literal)   # prints "\U0001f0cf"
repaired_emoji = literal_eval(emoji_literal)
print(repaired_emoji)  # prints 🃏

The emoji_literal contains "\U0001f0cf" which is exactly what you would type in when you didn't have a variable.

ast.literal_eval then interprets the string as if we used it as a string literal in python.

Upvotes: 0

Related Questions