Reputation: 23
First of all, sorry for my poor and approximate english...
I'm trying to do a Python script, who should retrieve a variable that will represent the unicode code corresponding to an emoji (U000xxxx). The final goal of this part of the program is to have translated, from unicode, in the name of the emoji.
Since I know that in Python to display an emoji it is print("\U000XXXXX")
, so I added the \
before the previous name.
But when I print the final rendering is not the one expected
unicode = "U0001f0cf"
unicode = (f"\{unicode}") #OR# unicode = "\%s" %unicode
print (unicode) #>>> \U0001f0cf
#Expected >>> 🃏
I tried a lot of things including .encode()
but Python told me I couldn't use a string pattern on an object of type bytes (?)
This is the part that is causing me problem, all the rest of the process is ok... To translate the name of the emoji, from unicode, I found this method (crafted from another Stackoverflow topic)
name = emojis.decode(unicode).replace("_"," ").replace(":","")
print(name) #>>> \U0001f0cf
Whereas if I directly enter the unicode code it works...
name = emojis.decode("U0001f0cf").replace("_"," ").replace(":","")
print(name) #>>> :black_joker:
Thank you very much to anyone who will try to help me, Have a good evening
Upvotes: 2
Views: 2311
Reputation: 177406
First get the numeric part from the variable, then use chr()
to convert it to its Unicode equivalent, then use the unicodedata
database to fetch its name:
import unicodedata as ud
u = 'U0001f0cf'
i = int(u[1:],16)
c = chr(i)
n = ud.name(c)
print(c,n)
Output:
🃏 PLAYING CARD BLACK JOKER
You can also use a range loop to display a number of emoji:
import unicodedata as ud
for i in range(0x1f0c1,0x1f0d0):
c = chr(i)
n = ud.name(c)
print(c,n)
Output:
🃁 PLAYING CARD ACE OF DIAMONDS
🃂 PLAYING CARD TWO OF DIAMONDS
🃃 PLAYING CARD THREE OF DIAMONDS
🃄 PLAYING CARD FOUR OF DIAMONDS
🃅 PLAYING CARD FIVE OF DIAMONDS
🃆 PLAYING CARD SIX OF DIAMONDS
🃇 PLAYING CARD SEVEN OF DIAMONDS
🃈 PLAYING CARD EIGHT OF DIAMONDS
🃉 PLAYING CARD NINE OF DIAMONDS
🃊 PLAYING CARD TEN OF DIAMONDS
🃋 PLAYING CARD JACK OF DIAMONDS
🃌 PLAYING CARD KNIGHT OF DIAMONDS
🃍 PLAYING CARD QUEEN OF DIAMONDS
🃎 PLAYING CARD KING OF DIAMONDS
🃏 PLAYING CARD BLACK JOKER
Upvotes: 2
Reputation: 652
the simple way of getting the unicode character is to include the backslash in the first place:
unicode = "\U0001f0cf"
print (unicode) #>>> 🃏
the other way is more complex and a bit ugly due to the use of eval
:
unicode = "U0001f0cf"
unicode = eval(f'"\\{unicode}"')
print(unicode) #>>> 🃏
in this case f'"\\{unicode}"'
is evaluated to '"\U0001f0cf"'
and the string inside the f-string is evaluated ("\U0001f0cf"
to 🃏`).
edit (because of tripleee's comment):
eval
is insecure when used with user input because the user can evaluate any code (including os-commands). but as long as you only use the code for yourself, this is not an issue.
alternatives are:
ast.literal_eval
like in Wombatz answer for safe-evaluatingchr
like in tripleee's answer, which is a very elegant and fitting solution.Upvotes: -2
Reputation: 189317
You are confused about the meaning of the backslash. In Python source code, "\U0001f0cf"
encodes a single character in a string. You can't turn the nine-character string "U0001f0cf"
into a single character by adding a backslash in front, any more that concatenating a literal backslash in front of "n"
turns it into a newline.
What you can do easily is drop the U
and convert that hex number into a character via chr()
.
unicode = "U0001f0cf"
print(chr(int(unicode[1:], 16)))
int("string", base)
converts string
to a number in the specified base
.
Upvotes: 3
Reputation: 691
unicode = "U0001f0cf"
unicode = (f"\{unicode}")
print(unicode.encode('raw-unicode-escape').decode('unicode-escape'))
This gives you 🃏
instead of \U0001f0cf
Upvotes: 3
Reputation: 5448
You can use ast.literal_eval
for this.
We can build a valid string literal containing a unicode escape sequence for python. We just have to add "
.
from ast import literal_eval
user_input = 'U0001f0cf'
emoji_literal = f'"\\{user_input}"'
# ^ ^
# here and here
print(emoji_literal) # prints "\U0001f0cf"
repaired_emoji = literal_eval(emoji_literal)
print(repaired_emoji) # prints 🃏
The emoji_literal
contains "\U0001f0cf"
which is exactly what you would type in when you didn't have a variable.
ast.literal_eval
then interprets the string as if we used it as a string literal in python.
Upvotes: 0