Reputation: 79
Hello I am trying to find all emojis in downloaded tweets using python 2.7
I've tried that using the following code:
import os
import codecs
import emoji
from nltk.tokenize import word_tokenize
def extract_emojis(token):
emoji_list = []
if token in emoji.UNICODE_EMOJI:
emoji_list.append(token)
return emoji_list
for tweet in os.listdir(tweets_path):
with codecs.open(tweets_path+tweet, 'r', encoding='utf-8') as input_file:
line = input_file.readline()
while line:
line = word_tokenize(line)
for token in line:
print extract_emojis(token)
line = input_file.readline()
However I only get empty lists, instead of the emojis. If I get the following tweet
schuld van de sossen π‘ SP.a: wij hebben niks gedaan π΄ Groen: we gaan energie VERBIEDEN!
the output of the code is
[]
instead of the desired output:
[π‘, π΄]
Any help?? Thanks!
Upvotes: 4
Views: 5619
Reputation: 2730
Try this:
import emoji
text = "Hello! π How are you doing today? π Itβs a beautiful day, isnβt it? π Donβt forget to take a break and enjoy a cup of coffee β or tea π΅. Remember, youβre doing great! π Keep up the good work! πͺ"
emojis = [c for c in text if c in emoji.EMOJI_DATA]
print(emojis)
Output:
['π', 'π', 'π', 'β', 'π΅', 'π', 'πͺ']
Upvotes: 0
Reputation: 897
There are various ways by which we can extract the emojis from a string in Python.
On of the prominent is by using the emoji library.
If you are processing a file then make sure that you are read the file with encoding utf-8(while saving utf-8-sig too)
Here will show how to list all the emoji present in the string along with the no. of emojis in the string and type of each emoji
Code:
#import required libraries
import emoji
from emoji import UNICODE_EMOJI
#getting all emojis as lists
all_emojis = list(UNICODE_EMOJI.keys())
#defining sentence
sentence = "schuld van de sossen π‘ SP.a: wij hebben niks gedaan π΄ Groen: we gaan energie VERBIEDEN!"
#getting Emoji Count
emoji_count = sum([sentence.count(emoj) for emoj in UNICODE_EMOJI])
#listing all Emojis
listed_emojis = ','.join(re.findall(f"[{''.join(all_emojis)}]", str(sentence)))
#listing all Emoji Types
emoji_types = ','.join([UNICODE_EMOJI[detect_emoji].upper()[1:-1] for detect_emoji in listed_emojis.split(',')])
#Displaying Sentence, Emoji Count, Emojis and Emoji Types
print(f"Sentence: {sentence}\nListed Emojis: {listed_emojis}\nCount: {emoji_count}\nEmoji Types: {emoji_types}")
Output:
Sentence: schuld van de sossen π‘ SP.a: wij hebben niks gedaan π΄ Groen: we gaan energie VERBIEDEN!
Listed Emojis: π‘,π΄
Count: 2
Emoji Types: POUTING_FACE,SLEEPING_FACE
I hope this will helpful.. If anyone have query please write here. I will try to fix.. :)
Upvotes: 0
Reputation: 3669
This works in python 2 -
x = "schuld van de sossen π‘ SP.a: wij hebben niks gedaan π΄ Groen: we gaan energie VERBIEDEN!"
[i for i in x.split() if unicode(i, "utf-8") in emoji.UNICODE_EMOJI]
# OP
['\xf0\x9f\x98\xa1', '\xf0\x9f\x98\xb4']
Upvotes: 1
Reputation: 2605
Make sure that your text it's decoded on utf-8 text.decode('utf-8')
Locate all emoji from your text, you must separate the text character by character [str for str in decode]
Saves all emoji in a list [c for c in allchars if c in emoji.UNICODE_EMOJI]
Something like this:
import emoji
text = "π€ π lorum ipsum π de πππ"
decode = text.decode('utf-8')
allchars = [str for str in decode]
list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
print list
[u'\U0001f914', u'\U0001f648', u'\U0001f60c', u'\U0001f495', u'\U0001f46d', u'\U0001f459']
To get back your Emojis try this
Upvotes: 1