Reputation: 1033
I have created some code to search through a string and return True if there is an emoji in the string. The strings are found in a column in a pandas dataframe, and one can assume the string and the length of the dataframe could be arbitrarily long. I then create a new column in my dataframe with these boolean results.
Here is my code:
import emoji
contains_emoji = []
for row in df['post_text']:
emoji_found = False
for char in row:
if emoji.is_emoji(char):
emoji_found = True
break
contains_emoji.append(emoji_found)
df['has_emoji'] = contains_emoji
In an effort to get slicker, I was wondering if anyone could recommend a faster, shorter, or more pythonic way of searching like this?
Upvotes: 1
Views: 50
Reputation: 48
You can use str.contains with a regex pattern that matches any emoji:
df['has_emoji'] = df['post_text'].str.contains(r'[\U0001f600-\U0001f650]')
For reference here is a link to the source code for emoji.emoji_count(): https://github.com/carpedm20/emoji/blob/master/emoji/core.py
Upvotes: 1
Reputation: 4105
Use emoji.emoji_count()
:
import emoji
# Create example dataframe
df = pd.DataFrame({'post_text':['🌍', '😂', 'text 😃', 'abc']})
# Create column based on emoji within text
df['has_emoji'] = df['post_text'].apply(lambda x: emoji.emoji_count(x) > 0)
# print dataframe
print(df)
OUTPUT:
post_text has_emoji
0 🌍 True
1 😂 True
2 text 😃 True
3 abc False
Upvotes: 2
Reputation: 1921
why not just
df["has_emoji"] = df.post_text.apply(emoji.emoji_count) > 0
Upvotes: 2