Reputation: 1251
I try to replace emojis with their meaning.
Tweets$text[19]
"I ❤️ flying . ☺️\U0001f44d"
For this task, I use the textclean
package. The lexicon does not only include the emoji description but also the byte code representation (x: column):
hash_emojis[1:3]
x y
1: <e2><86><95> up-down arrow
2: <e2><86><99> down-left arrow
3: <e2><86><a9> right arrow curving left
So the result looks like this:
Tweets$text[19] = replace_emoji(Tweets$text[19], emoji_dt = lexicon::hash_emojis)
Tweets$text[19]
"I red heart <ef><b8><8f> flying . smiling face <ef><b8><8f> thumbs up "
I only want to get the description without the byte code representation because I have to clean it again. How can I apply only the "y column" to the text? Is their maybe a better way to deal with emojis in R?
Upvotes: 2
Views: 1115
Reputation: 23598
After using replace_emoji
, you can use replace_non_ascii
to get rid of the ascii codes
text <- "I ❤️ flying . ☺️\U0001f44d"
t <- replace_emoji(text)
replace_non_ascii(t)
"I red heart flying . smiling face thumbs up"
Upvotes: 2