Reputation: 161
I am working on a project where I have used tweets with Emojis and Emoticons. My main goal is to get the combined sentiment score of the tweets( text + Emoticons ) and as we know these emoticons are probably the most meaningful part of the data and that's they can not be neglected. I have converted the encoding structure of the emojis and emoticons via iconv but I am only getting the sentiment score for the text, not the emojis. I am using Vader sentiment in this process but if there is another Sentiment library/Lexicon that can be used which will give me the senti score for all the emojis too it will be a lot helpful and highly appreciated.
Tweets:
dput(df_emoji$Description)
c("DoorDash or Uber method asap<f0><9f><98><ad> cause I be starving<f0><9f><98><ad><f0><9f><98><ad>",
"such a real ahh niqq cuz I be having myself weak asl<f0><9f><98><82>",
"shii made me laugh so fuccin hard bro<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"Hart and Will Ferrell made a Gem in Get hard fr<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"@NigerianAmazon Chill<f0><9f><a4><a3><f0><9f><98><ad>", "so bomedy <f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"is that ass Gotdam<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"wild<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"them late night DoorDash<e2><80><99>s be goin crazy<f0><9f><a4><a3>",
"of the week<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>"
)
Code:
emoji_senti <- data.frame(text = iconv(data_sample$text, "latin1", "ASCII", "byte"),
stringsAsFactors = FALSE)
column1 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "\\ ")
column2 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "^[^\\s]*\\s")
df_emoji <- data.frame(Bytes = column1$Bytes, Description = column2$Description)
allvals_emoji <- NULL
for (i in 1:length(df_emoji$Description)){
outs <- vader_df(df_emoji$Description[i])
allvals_emoji <- rbind(allvals_emoji,outs)
}
allvals_emoji
See this that the first tweet has only 9 English words which have their scores but it misses the score for converted Unicode for emojis.
# word_scores compound pos neu neg but_count
# 1 {0, 0, 0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 2 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.9, 0, 0} -0.440 0.000 0.805 0.195 0
# 3 {0, 0, 0, 2.6, 0, 0, -0.67835, 0, 0} 0.444 0.293 0.570 0.137 0
# 4 {0, 0, 0, 0, 0, 0, 0, 0, 0, -0.4, 0} -0.103 0.000 0.877 0.123 0
# 5 {0, 0} 0.000 0.000 1.000 0.000 0
# 6 {0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 7 {0, 0, -2.5, 0, 0} -0.542 0.000 0.533 0.467 0
# 8 {0, 0} 0.000 0.000 1.000 0.000 0
# 9 {0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 10 {0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
Upvotes: 1
Views: 606
Reputation: 103
Check this discussion: VaderSentiment: unable to update emoji sentiment score
"Vader transforms emojis to their word representation prior to extracting sentiment"
Basically from what I tested out emoji's values are hidden but part of the score and can influence it. If you need the score for a specific emoji you can check library(lexicon)
and run data.frame(hash_emojis_identifier)
(dataframe that contains identifiers for emojis and matches them to a lexicon format) and data.frame(hash_sentiment_emojis)
to get each emoji sentiment value. It is not possible though to determine from that what was the impact of a series of emojis over the total message score without knowing how vader calculates their cumulative impact on the score itself using libraries such as vader, lexicon.
You can evaluate the impact of the emoji though by doing a simple difference between the total score value of the message with emojis and the score without it:
allvals <- NULL
for (i in 1:length(data_sample)){
outs <- vader_df(data_sample[i])
allvals <- rbind(allvals,outs)
}
allvalswithout <- NULL
for (i in 1:length(data_samplewithout)){
outs <- vader_df(data_samplewithout[i])
allvalswithout <- rbind(allvalswithout,outs)
}
emojiscore <- allvals$compound-allvalswithout$compound
Then:
allvals <- cbind(allvals,emojiscore)
Now for large datasets it would be ideal to automate the process of removing emojis out of texts. Here i just removed it manually to propose this kind of approach to the problem.
Upvotes: 1