Stat
Stat

Reputation: 681

R: SentimentAnalysis Package

Does anyone know why aggregate argument in function analyzeSentiment from SentimentAnalysis package in R is not grouping sentiment scores? Here is a simple reproducable example:

> library(SentimentAnalysis)
> documents <- c("Wow, I really like the new light sabers!",
+                "That book was excellent.",
+                "R is a fantastic language.",
+                "The service in this restaurant was miserable.",
+                "This is neither positive or negative.",
+                "The waiter forget about my dessert -- what poor service!")
> Group=factor(c(1,1,2,2,3,3))
> Test_data=data.frame(documents=documents,Group=Group,stringsAsFactors = F)
> sentiment <- analyzeSentiment(x=Test_data$documents,aggregate=Test_data$Group)
> Test_data$SentimentQDAP=sentiment$SentimentQDAP
> Test_data
                                                 documents Group SentimentQDAP
1                 Wow, I really like the new light sabers!     1     0.3333333
2                                 That book was excellent.     1     0.5000000
3                               R is a fantastic language.     2     0.5000000
4            The service in this restaurant was miserable.     2    -0.3333333
5                    This is neither positive or negative.     3     0.0000000
6 The waiter forget about my dessert -- what poor service!     3    -0.4000000

Which gives me the following:

enter image description here

The help on function analyzeSentiment says that " aggregate: A factor variable by which documents can be grouped. This helpful when joining e.g. news from the same day or move reviews by the same author".

The question I have is that why scores are not the same within each group?

Upvotes: 0

Views: 155

Answers (1)

Count Orlok
Count Orlok

Reputation: 1007

I checked the source code of the analyzeSentiment function to see what happens with that aggregate argument.

Interestingly, the reason it's not doing anything within your code is because that argument is essentially redundant - it never gets used at all! If you follow the call stack starting from the initial analyzeSentiment function, the aggregate argument just gets passed around until it reaches the main hub of sentiment computation - the analyzeSentiment.DocumentTermMatrix. This is where the data frame of results is computed and then returned, and the value passed to aggregate seems to make no appearance anywhere in the code. It's passed in and never used.

Must be either a feature they never added to the package, or an artifact left over from a previous version for the sake of backward compatibility.

Upvotes: 1

Related Questions