m4sh4
m4sh4

Reputation: 79

Find emojis in a text using python

Hello I am trying to find all emojis in downloaded tweets using python 2.7

I've tried that using the following code:

import os
import codecs
import emoji
from nltk.tokenize import word_tokenize

def extract_emojis(token):
    emoji_list = []
    if token in emoji.UNICODE_EMOJI:
        emoji_list.append(token)
    return emoji_list

for tweet in os.listdir(tweets_path):
    with codecs.open(tweets_path+tweet, 'r', encoding='utf-8') as input_file:
        line = input_file.readline()
        while line:
            line = word_tokenize(line)
            for token in line:
                print extract_emojis(token)

            line = input_file.readline()

However I only get empty lists, instead of the emojis. If I get the following tweet

schuld van de sossen 😑 SP.a: wij hebben niks gedaan 😴 Groen: we gaan energie VERBIEDEN!

the output of the code is

[]

instead of the desired output:

[😑, 😴]

Any help?? Thanks!

Upvotes: 4

Views: 5619

Answers (4)

hafiz031
hafiz031

Reputation: 2730

Try this:

import emoji

text = "Hello! πŸ‘‹ How are you doing today? 😊 It’s a beautiful day, isn’t it? 🌞 Don’t forget to take a break and enjoy a cup of coffee β˜• or tea 🍡. Remember, you’re doing great! πŸ‘ Keep up the good work! πŸ’ͺ"
emojis = [c for c in text if c in emoji.EMOJI_DATA]

print(emojis)

Output:

['πŸ‘‹', '😊', '🌞', 'β˜•', '🍡', 'πŸ‘', 'πŸ’ͺ']

Upvotes: 0

Littin Rajan
Littin Rajan

Reputation: 897

There are various ways by which we can extract the emojis from a string in Python.

On of the prominent is by using the emoji library.

If you are processing a file then make sure that you are read the file with encoding utf-8(while saving utf-8-sig too)

Here will show how to list all the emoji present in the string along with the no. of emojis in the string and type of each emoji

Code:

#import required libraries
import emoji
from emoji import UNICODE_EMOJI

#getting all emojis as lists
all_emojis = list(UNICODE_EMOJI.keys())

#defining sentence
sentence = "schuld van de sossen 😑 SP.a: wij hebben niks gedaan 😴 Groen: we gaan energie VERBIEDEN!"

#getting Emoji Count
emoji_count = sum([sentence.count(emoj) for emoj in UNICODE_EMOJI])
#listing all Emojis
listed_emojis = ','.join(re.findall(f"[{''.join(all_emojis)}]", str(sentence)))
#listing all Emoji Types
emoji_types = ','.join([UNICODE_EMOJI[detect_emoji].upper()[1:-1] for detect_emoji in listed_emojis.split(',')])

#Displaying Sentence, Emoji Count, Emojis and Emoji Types
print(f"Sentence: {sentence}\nListed Emojis: {listed_emojis}\nCount: {emoji_count}\nEmoji Types: {emoji_types}")

Output:

Sentence: schuld van de sossen 😑 SP.a: wij hebben niks gedaan 😴 Groen: we gaan energie VERBIEDEN!
Listed Emojis: 😑,😴
Count: 2
Emoji Types: POUTING_FACE,SLEEPING_FACE

I hope this will helpful.. If anyone have query please write here. I will try to fix.. :)

Upvotes: 0

Sushant
Sushant

Reputation: 3669

This works in python 2 -

x = "schuld van de sossen 😑 SP.a: wij hebben niks gedaan 😴 Groen: we gaan energie VERBIEDEN!"
[i for i in x.split() if unicode(i, "utf-8") in emoji.UNICODE_EMOJI]

# OP
['\xf0\x9f\x98\xa1', '\xf0\x9f\x98\xb4']

Upvotes: 1

Irfanuddin
Irfanuddin

Reputation: 2605

Make sure that your text it's decoded on utf-8 text.decode('utf-8')

Locate all emoji from your text, you must separate the text character by character [str for str in decode]

Saves all emoji in a list [c for c in allchars if c in emoji.UNICODE_EMOJI]

Something like this:

import emoji
text     = "πŸ€” πŸ™ˆ lorum ipsum 😌 de πŸ’•πŸ‘­πŸ‘™"
decode   = text.decode('utf-8')
allchars = [str for str in decode]
list     = [c for c in allchars if c in emoji.UNICODE_EMOJI]
print list

[u'\U0001f914', u'\U0001f648', u'\U0001f60c', u'\U0001f495', u'\U0001f46d', u'\U0001f459']

To get back your Emojis try this

Upvotes: 1

Related Questions