ojp
ojp

Reputation: 1033

Efficient and pythonic way to search through a long string

I have created some code to search through a string and return True if there is an emoji in the string. The strings are found in a column in a pandas dataframe, and one can assume the string and the length of the dataframe could be arbitrarily long. I then create a new column in my dataframe with these boolean results.

Here is my code:

import emoji

contains_emoji = []
            
for row in df['post_text']:
    emoji_found = False
    for char in row:
        if emoji.is_emoji(char):
            emoji_found = True
            break
    contains_emoji.append(emoji_found)

df['has_emoji'] = contains_emoji

In an effort to get slicker, I was wondering if anyone could recommend a faster, shorter, or more pythonic way of searching like this?

Upvotes: 1

Views: 50

Answers (3)

ZachW
ZachW

Reputation: 48

You can use str.contains with a regex pattern that matches any emoji:

df['has_emoji'] = df['post_text'].str.contains(r'[\U0001f600-\U0001f650]')

For reference here is a link to the source code for emoji.emoji_count(): https://github.com/carpedm20/emoji/blob/master/emoji/core.py

Upvotes: 1

ScottC
ScottC

Reputation: 4105

Use emoji.emoji_count():

import emoji

# Create example dataframe
df = pd.DataFrame({'post_text':['🌍', '😂', 'text 😃', 'abc']})

# Create column based on emoji within text
df['has_emoji'] = df['post_text'].apply(lambda x: emoji.emoji_count(x) > 0)

# print dataframe
print(df)

OUTPUT:

  post_text  has_emoji
0         🌍       True
1         😂       True
2    text 😃       True
3       abc        False

Upvotes: 2

najeem
najeem

Reputation: 1921

why not just

df["has_emoji"] = df.post_text.apply(emoji.emoji_count) > 0

Upvotes: 2

Related Questions