Schlaeggi
Schlaeggi

Reputation: 35

Replacing emojis with textclean won't fully working for me in R

I am trying to convert the emojis contained in a twitter dataset into words. I am using the "textclean" package in R, but when using replace_emoji, some emojies are replaced by the corresponding words and others are shown in a other format

df_test$tweet <- textclean::replace_emoji(df_test$tweet)

My expected output for e.g. "🦈 STARSHARKS OVERVIEW 🦈" would be something like :

shark STARSHARKS OVERVIEW shark

instead I get:

<f0><9f><a6><88> STARSHARKS OVERVIEW <f0><9f><a6><88>

Another problem I encounter is that even apostrophes are replaced by the aforementioned format.

It seems somewhat weird to me because some of the emojis are actually replaced correctly.

I would be thankfull for any help as I am quite new to coding in R.

Upvotes: 2

Views: 113

Answers (1)

mhovd
mhovd

Reputation: 4087

The issue seems to be that the shark emoji is not in the lexicon::hash_emojis data table. As such, you need to define your own data.table with these emojis.

test = "🦈 STARSHARKS OVERVIEW 🦈"

custom_emoji_dt = data.table::data.table(x = "<f0><9f><a6><88>", y = "shark")

textclean::replace_emoji(test, emoji_dt = custom_emoji_dt)
#> [1] "shark STARSHARKS OVERVIEW shark "

Created on 2022-07-25 by the reprex package (v2.0.1)

Upvotes: 2

Related Questions