Reputation: 123
Dear Stackoverlow crowd
I managed to use the qdap polarity function to calculate the polarity of some blog entries, loading my own dictionary, based on sentiWS. Now I do have a new sentiment dictionary (SePL) which not only contains single words, but as well phrases. For example "simply good", where "simply" is neither a negator nor an amplifier, but makes it more precise. So i was wondering, wether I could search for ngrams using the polarity function of qdap.
As an example:
library(qdap)
phrase <- "This is simply the best"
key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
counts(polarity(phrase, polarity.frame=key))
gives:
all wc polarity pos.words neg.words text.var
1 all 5 0.179 simply, best - This is simply the best
However, I would like to get an output like:
all wc polarity pos.words neg.words text.var
1 all 5 0.76 simply the best - This is simply the best
Anyone an Idea how to get that working like that?
All the best, Ben
Upvotes: 3
Views: 1401
Reputation: 109874
This is a bug reintroduced with chages to the bag_o_word
function earlier this year. This is the second time a bug like this has affected ngram polarity since I enble the usage of ngrams in polarity.frame: https://github.com/trinker/qdap/issues/185
I have fixed the bug and added a unit test to ensure this bug doesn't creep back into the code. Your code in qdap 2.2.1 now gives the desired output, though the warning against the original intention of the algorithm remains:
> library(qdap)
> phrase <- "This is simply the best"
> key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
> counts(polarity(phrase, polarity.frame=key))
all wc polarity pos.words neg.words text.var
1 all 5 0.358 simply the best - This is simply the best
qdap's polarity
function uses an algorithm that was not designed to operate like this. You can do it using the following hack but know that it is out of the intent of the underlying theory used in the function's algorithm:
library(qdap)
phrase <- "This is simply the best"
terms <- c("simply", "best", "simply the best")
key <- sentiment_frame(space_fill(terms, terms, sep="xxx"), NULL, c(0.1,0.3,0.8))
counts(polarity(space_fill(phrase, terms, "xxx"), polarity.frame=key))
## all wc polarity pos.words neg.words text.var
## 1 all 3 0.462 simplyxxxthexxxbest - This is simplyxxxthexxxbest
Upvotes: 2