Emojis not showing up properly using python emoji package

Question

I wrote a script that extracts all emojis from a given dataset:

for message in df['Message']:
     for char in message:
         if char in emoji.UNICODE_EMOJI:
              print(char)

It kinda works and correctly identifies which characters are emojis. However, the output does not correctly parse some of the emojis and they simply show up as brown square:

🏽

Why is this happening? Is there any way of solving this? Most emojis show up just fine but there are a few that just won't.

Edit: After looking into it again, it seems like the brown squares come with certain emojis to state the used color tone.

However, some there are still some issues with certain emojis. The usual heart emoji, for example does show up as a heart character but not in the emoji style. Screenshot because pasting it here ends up displaying it correctly:

runDOSrun · Accepted Answer

The issue is that dark skin tones (and color variants in general) are encoded as two separate symbols instead of one, i.e. 👍🏿 results from the two symbols 👍 🏿 (second gives the color).

You can see it from this example:

df = pd.DataFrame({"Message": ["test 👍🏿 "]})
for message in df['Message']:
    for char in message:
        if char in emoji.UNICODE_EMOJI:
            print(char)
👍
🏿

So you will have to use regex (as per this answer):

import regex
df = pd.DataFrame({"Message": ["test 👍🏿 ", "test 2 👍 👍"]})

def split_count(text):

    emoji_list = []
    data = regex.findall(r'\X', text)
    for word in data:
        if any(char in emoji.UNICODE_EMOJI for char in word):
            emoji_list.append(word)

    return emoji_list

for message in df['Message']:
    counter = split_count(message)
    print(' '.join(emoji for emoji in counter))

output:

👍🏿
👍 👍

Emojis not showing up properly using python emoji package

Answers (1)

Related Questions