user12194362
user12194362

Reputation: 79

Working out the number of emojis in each dataframe row

Text

0 🤔 🙈 me así se 😌 ds 💕👭👙 hello 👩🏾‍🎓

1 🤔 🙈 me así se 😌 ds 💕👭👙 hello

2 🤔 🙈 me así se 😌 ds

3 🤔 🙈 me así

I want to add a column to my dataframe (df), above, called 'Emoji Count' based on the number of emojis in each row.

For instance, the first row would have a count of 7 as there are 7 emojis in this row.

I understand that to create a new column based on information in the "Text" column I would enter:

df["Emoji Count"] = df["Text].....

I was able to create a function that counts the number of emojis but I wasn't able to apply this to my dataframe:

def split_count(info):

    emoji_list = []
    data = regex.findall(r'\X', info)
    for word in data:
        if any(char in emoji.UNICODE_EMOJI for char in word):
            emoji_list.append(word)

    return len(emoji_list)

Upvotes: 3

Views: 932

Answers (2)

Roy2012
Roy2012

Reputation: 12523

Just do:

df["Emoji Count"] = df.Text.apply(split_count)

or

df["Emoji Count"] = df['Text'].apply(split_count)

This will apply your function to each cell, and assign the result back to the Emoji Count column.

Upvotes: 2

Gustav Rasmussen
Gustav Rasmussen

Reputation: 3971

Manually specified the dataframe to hold emoji's, then simplified your split_count(info) function and applied it to the dataframe in order to create the new "Emoji Count" column:

import pandas as pd
import emoji
import re

e_1 = emoji.emojize(":thinking_face:")
e_2 = emoji.emojize(":see-no-evil_monkey:")
e_3 = emoji.emojize(":relieved_face:")
e_4 = emoji.emojize(":two_hearts:")
e_5 = emoji.emojize(":two_women_holding_hands:")
e_6 = emoji.emojize(":bikini:")
e_7 = emoji.emojize(":woman_student_medium-dark_skin_tone:")

df = pd.DataFrame(
    [
        [f"{e_1}{e_2} me así se {e_3} ds {e_4}{e_5}{e_6} hello {e_7}"],
        [f"{e_1}{e_2} me así se {e_3} ds {e_4}{e_5}{e_6} hello"],
        [f"{e_1}{e_2} me así se {e_3} ds"],
        [f"{e_1}{e_2} me así"],
    ],
    columns=["Text"],
)


def split_count(info):
    return len([c for c in info if c in emoji.UNICODE_EMOJI])


df["Emoji Count"] = df["Text"].apply(split_count)
print(df)

Returning:

                                 Text       Emoji Count
0  🤔🙈 me así se 😌 ds 💕👭👙 hello 👩🏾‍🎓              7
1  🤔🙈 me así se 😌 ds 💕👭👙 hello                 6
2  🤔🙈 me así se 😌 ds                             3
3  🤔🙈 me así                                     2

Upvotes: 1

Related Questions