devcoder
devcoder

Reputation: 1685

Is the average of individual sentiment analysis of 5000 comments the same as sentiment analysis of concatenation of 5000 comments?

I'm trying to do a sentiment analysis on a reddit thread. The issue I'm facing is that some of the free tiers of cloud NLP APIs (Google Natural Language, Azure Text Analytics etc.) only allow 5000 calls per month in the free tier. I'm trying to see if I can concatenate some of the comments up to the max limit of characters per call to get more of the comments analyzed in the free tier.

Upvotes: 2

Views: 169

Answers (1)

Adnan S
Adnan S

Reputation: 1882

Interesting question - IF the comments were independent and not related at all THEN concatenation or average would both probably lead you to a neutral score - similar to the outcome of a series of coin tosses is 0.5 and not 1 or 0. This would not be very useful.

However, assuming you are doing sentiment analysis of a reddit thread around one post (and not analyses of threads of multiple posts within a subreddit), you will likely get the same result with concatenation or average. Comments in a reddit thread are generally related and either positive or negative (or completely unrelated). So you should pick up the sentiment with your proposed concatenation approach in your use case.

My theory (not backed by data yet) is that using the average or concatenation will tend to cluster your sentiments around neutral and you will not see strong positives or negatives.

Upvotes: 0

Related Questions