Reputation: 681
Does anyone know why aggregate
argument in function analyzeSentiment
from SentimentAnalysis package in R is not grouping sentiment scores? Here is a simple reproducable example:
> library(SentimentAnalysis)
> documents <- c("Wow, I really like the new light sabers!",
+ "That book was excellent.",
+ "R is a fantastic language.",
+ "The service in this restaurant was miserable.",
+ "This is neither positive or negative.",
+ "The waiter forget about my dessert -- what poor service!")
> Group=factor(c(1,1,2,2,3,3))
> Test_data=data.frame(documents=documents,Group=Group,stringsAsFactors = F)
> sentiment <- analyzeSentiment(x=Test_data$documents,aggregate=Test_data$Group)
> Test_data$SentimentQDAP=sentiment$SentimentQDAP
> Test_data
documents Group SentimentQDAP
1 Wow, I really like the new light sabers! 1 0.3333333
2 That book was excellent. 1 0.5000000
3 R is a fantastic language. 2 0.5000000
4 The service in this restaurant was miserable. 2 -0.3333333
5 This is neither positive or negative. 3 0.0000000
6 The waiter forget about my dessert -- what poor service! 3 -0.4000000
Which gives me the following:
The help on function analyzeSentiment says that " aggregate: A factor variable by which documents can be grouped. This helpful when joining e.g. news from the same day or move reviews by the same author".
The question I have is that why scores are not the same within each group?
Upvotes: 0
Views: 155
Reputation: 1007
I checked the source code of the analyzeSentiment
function to see what happens with that aggregate
argument.
Interestingly, the reason it's not doing anything within your code is because that argument is essentially redundant - it never gets used at all! If you follow the call stack starting from the initial analyzeSentiment
function, the aggregate
argument just gets passed around until it reaches the main hub of sentiment computation - the analyzeSentiment.DocumentTermMatrix
. This is where the data frame of results is computed and then returned, and the value passed to aggregate
seems to make no appearance anywhere in the code. It's passed in and never used.
Must be either a feature they never added to the package, or an artifact left over from a previous version for the sake of backward compatibility.
Upvotes: 1