Reputation: 35
I am trying to convert the emojis contained in a twitter dataset into words. I am using the "textclean" package in R, but when using replace_emoji, some emojies are replaced by the corresponding words and others are shown in a other format
df_test$tweet <- textclean::replace_emoji(df_test$tweet)
My expected output for e.g. "🦈 STARSHARKS OVERVIEW 🦈" would be something like :
shark STARSHARKS OVERVIEW shark
instead I get:
<f0><9f><a6><88> STARSHARKS OVERVIEW <f0><9f><a6><88>
Another problem I encounter is that even apostrophes are replaced by the aforementioned format.
It seems somewhat weird to me because some of the emojis are actually replaced correctly.
I would be thankfull for any help as I am quite new to coding in R.
Upvotes: 2
Views: 113
Reputation: 4087
The issue seems to be that the shark emoji is not in the lexicon::hash_emojis
data table. As such, you need to define your own data.table
with these emojis.
test = "🦈 STARSHARKS OVERVIEW 🦈"
custom_emoji_dt = data.table::data.table(x = "<f0><9f><a6><88>", y = "shark")
textclean::replace_emoji(test, emoji_dt = custom_emoji_dt)
#> [1] "shark STARSHARKS OVERVIEW shark "
Created on 2022-07-25 by the reprex package (v2.0.1)
Upvotes: 2