Kironmoy Mukherjee
Kironmoy Mukherjee

Reputation: 31

How to store strings in variables, and then use those variables in other functions or methods in Python?

So I came across this simple problem of printing emojis in python. I assume there's multiple methods to do so, but these are the three major ones that I found:-

  1. Using UNICODE of the emoji
  2. Using CLDR names of the emoji
  3. Using the emoji module

I wanted to make a program (using each of the three methods) where we take an input from the user asking which emoji they want to print, and then print it in the next line.

This meant- If the program were created using method 1, the user had to input a unicode. If it were created using method 2, they would have to input the CLDR name If it were created using method 3, they would have to input the name of the emoji (based on the syntax of the emoji module).

The actual problem that I'm facing is in storing the input taken from the user into a variable and then trying to use that variable to generate an emoji. This is because the input gets stored as a string, and so while printing it, the print command simply prints the string instead of looking at it as a unicode.

For method 1: I tried

user_emoji = input("Enter the unicode:- ")

print(r"\{}".format(user_emoji))

But this just gave me the following when I tried entering the unicode

\U0001f0cf

I found a solution for this when I looked up online for it here but didn't really understand what was actually going on here.

For method 2: I tried

user_emoji = input("Enter the CLDR name:- ")

print(r"\N{}".format(user_emoji))

But once again I got just normal text when I entered 'slightly smiling face'.

\Nslightly smiling face

Here I thought that if there was a way to convert the CLDR name to a UNICODE then I can use the solution from above and brute force my way to get the result, but I couldn't find a way to do so either.

For method 3 I tried

import emoji

user_emoji = input("Enter the emoji name:- ")
user_emoji = user_emoji.replace(" ","_")

print(emoji.emojize(r':{}:'.format(user_emoji)))

This was the only method that gave me the result I wanted when I gave 'slightly smiling face' as an input.

๐Ÿ™‚

Hope someone can explain how the solution in method 1 works and what I need to do to make method 2 work.

Upvotes: 2

Views: 373

Answers (3)

Andj
Andj

Reputation: 1447

Emoji can be divided into two broad categories: simple emoji (consisting of a single Unicode codepoint) and complex emoji (consisting of multiple codepoints for a single emoji)

To illustrate the complexity, I have drawn a few emoji from the Full Emoji List, v15.1 using both codepoints and the emoji's CLDR Short Name. See table below.

The OP wanted to be able to print emoji given 1) the codepoints, 2) the emoji name, 3) using the emoji package.

Currently the only way to get the emoji character by name is to use the emoji package, and since the emoji package supports alias and a range of parameters, the most practical approach is to combine 2) and 3) into one and just support codepoints and The CLDR short names.

Python supports the \N{} syntax for specifying characters in a string. It is also possible to use the unicodedata module. But these approaches can only support simple emoji, and will not work for complex emoji.

I tend to use the regex module instead of re, but the regex patterns can be rewritten for re.

The code:

  1. reads in terminal input
  2. tests if input consists of a sequence of hexadecimal digits of four to five characters (codepoints)
  3. If codepoints, split the input and convert each codepoint to a character, and join the characters. Else it will clean the input string and use the emoji package to convert to characters.
import regex
import emoji

emoji_cp = False
emoji_input = input("Enter the emoji:- ")

chars = []
emoji_out = ''
hex_pattern = regex.compile(r'^(?:(\p{Hex_Digit}{4,5})\p{White_Space}*)+$')
if hex_pattern.match(emoji_input):
    emojis = emoji_input.split()
    for emoji in emojis:
        chars.append(chr(int(emoji, 16)))
    emoji_out = "".join(chars)
else:
    emoji_input = regex.sub(r'[:,\p{White_Space}]+', '_', emoji_input)
    emoji_out = emoji.emojize(f":{emoji_input}:")
print(emoji_out)

Example input and output from code:

Input Output
woman gesturing OK ๐Ÿ™†โ€โ™€๏ธ
family: man, woman, girl, boy ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ
family man woman girl boy ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ
1F468 200D 1F469 200D 1F467 200D 1F467 ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง
2764 FE0F โค๏ธ
backhand index pointing left ๐Ÿ‘ˆ
1F64F ๐Ÿ™

Upvotes: 1

Brentspine
Brentspine

Reputation: 676

Method 1

To fix method one, we convert the input into an integer using int and chr is then used to convert that integer to the corresponding Unicode character. This way, you get the actual emoji.

user_emoji = input("Enter the unicode:- ")
print(chr(int(user_emoji, 16)))

If we then input 1F642, which is the unicode for "slightly smiling face" we get the correct output

enter image description here

Method 2

Method 2 on the other hand I wouldn't recommend using. It isn't very straightforward and there is no good solution and you should be fine using Method 1. If you really want to use CLDR names, you might need to create a mapping or use a library that provides such mapping.

A simple mapping dictionary could look like this:

cldr_mapping = {
    "slightly smiling face": "\U0001F642",
}

user_emoji = input("Enter the CLDR name:- ")
print(cldr_mapping.get(user_emoji, "Emoji not found bro"))

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96257

You are fundamentally misunderstanding the difference between a string literal (which is source code that creates the string object) and a string (the actual object).

If I write

"\n" 

In source code, that is a string literal which evaluates to a string with a single character, the newline character.

>>> print("\n")


>>> len("\n")
1
>>> list("\n")
['\n']

If I write r"\{}".format('n') this creates a string, which looks like the string literal for a newline, but it isn't source code. It is a string with two characters, the backslash character and the 'n' character:

>>> string = r"\{}".format('n')
>>> print(string)
\n
>>> len(string)
2
>>> list(string)
['\\', 'n']

If you want to accept the unicode code point, that is simply a number, in your case, you seem to provide it in base 16. All the code you linked to does is convert a string which represents a base-16 number to an int object (the second argument is the base, it defaults to base 10), and then it uses the chr function to retreive the unicode character from that number:

>>> number_string = "0001f0cf"
>>> int(number_string, 16)
127183
>>> chr(127183)
'๐Ÿƒ'

Finally, if you want to use the CLDR name, you can use the unicodedata module (part of the standard library):

>>> import unicodedata
>>> cldr_name = "slightly smiling face"
>>> print(unicodedata.lookup(cldr_name))
๐Ÿ™‚

Upvotes: 3

Related Questions