Reputation: 1
I'm using the General Inquirer dictionary with the SentimentAnalysis package and I can't figure out how they assign the sentiment score...
For example, if I run the following code:
sentiment <- analyzeSentiment(sampledf)
summary(sentiment$SentimentGI)
I'll get an output like this:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.80000 -0.16667 -0.07692 -0.07313 0.00000 0.66667
What's the scale being used here? -1 to 1? I don't know how to interpret these results.
Thanks!
Upvotes: 0
Views: 1723
Reputation: 29
All sentiment-related scores are calculated based on the formula
(#positive - #negative) / #all
where #positive
refers to the number of positive words, #negative
to the number of negative words and #all
to the total word count. Hence, the sentiment score comes from the interval [-1, +1]. A value of 0
indicates that there are as many positive as negative words in a document.
NB: In practice, the empirical mean/median value is not necessarily located at exactly zero as either positive/negative is perceived stronger or even appears more frequent. Hence, one would prefer to choose a different cutoff point to discriminate positive from negative.
Other scores are as follows:
#negative / #all
and is in [0, 1]. (#positive - #negative) / (#positive + #negative)
.(#positive + #negative) / #all
.Upvotes: 2