maybeyourneighour
maybeyourneighour

Reputation: 494

How can I remove emojis from a dataframe?

I know that

test = []
for item in my_texts:
    test.append(item.encode('ascii', 'ignore').decode('ascii'))

removes emojis from a list. But how can I remove emojis from a dataframe? When I try

a = []
for item in goldtest['Text']:
    a.append(item.encode('ascii', 'ignore').decode('ascii'))

I get only the last entry of goldtest. When I try the code on the whole dataframe, I get ''AttributeError: 'DataFrame' object has no attribute 'encode'''

Upvotes: 4

Views: 12781

Answers (3)

Guru Stron
Guru Stron

Reputation: 141575

You can use emoji package:

import emoji
df = pd.DataFrame(data={'str_data':['يااا واجعوط هذا راه باغي يبدع فالسانكيام😭🤦‍♀️']})
df['str_data'] = df['str_data'].apply(lambda s: emoji.replace_emoji(s, ''))
df

Output:

str_data
يااا واجعوط هذا راه باغي يبدع فالسانكيام

Upvotes: 3

Skynet
Skynet

Reputation: 45

This will remove all special characters including emojis except letters and numbers from a given Column

goldtest['Text'] = goldtest['Text'].str.replace('[^A-Za-z0-9]', '', flags=re.UNICODE)

Upvotes: 0

ivallesp
ivallesp

Reputation: 2202

This would be the equivalent code for pandas. It operates column by column.

df.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))

Upvotes: 10

Related Questions