Reputation: 3
I am trying to apply a sentiment analysis in R with the help of my DTM (document term matrix) or TDM (term document matrix). I could not find any similar topic in the forum and on google. Thus, I created a corpus and from that corpus I generated a dtm/tdm in R. My next step would be to apply the sentiment analysis which I need later for stock prediction via SVM. My give code is that:
dtm <- DocumentTermMatrix(docs)
dtm <- removeSparseTerms(dtm, 0.99)
dtm <- as.data.frame(as.matrix(dtm))
tdm <- TermDocumentMatrix(docs)
tdm <- removeSparseTerms(tdm, 0.99)
tdm <- as.data.frame(as.matrix(tdm))
I read that it is possible through the tidytext package with the help of the get_sentiments() function. But it was not possible to apply that with a DTM/TDM. How can I run a sentiment analysis for my cleaned filter words which are already stemmed, tokenized etc.? I saw that a lot of people did the sentiment analysis for a hole sentence, but I would like to apply it for my single words in order to see if they are positive, negative, score etc. Many thanks in advance!
Upvotes: 0
Views: 1642
Reputation: 499
To use tidytext on dtm to get sentiments convert dtm to tidy format first and then do inner join between tidy data and dictionary of polarised words.I will use the same document as used above. Some doc in above example are positive but given neutrel score. let's see how tidytext performs
library(tidytext)
library(tm)
library(dplyr)
library(tidyr)
documents <- c("Wow I really like the new light sabers",
"That book was excellent",
"R is a fantastic language",
"The service in this restaurant was miserable",
"This is neither positive or negative",
"The waiter forget about my dessert -- what poor service")
# create tidy format
vectors <- as.character(documents)
v_source <- VectorSource(vectors)
corpuss <- VCorpus(v_source)
dtm <- DocumentTermMatrix(corpuss)
as_tidy <- tidy(dtm)
# Using bing lexicon: you can use other as well(nrc/afinn)
bing <- get_sentiments("bing")
as_bing_words <- inner_join(as_tidy,bing,by = c("term"="word"))
# check positive and negative words
as_bing_words
# set index for documents number
index <- as_bing_words%>%mutate(doc=as.numeric(document))
# count by index and sentiment
index <- index %>% count(sentiment,doc)
# spread into positives and negavtives
index <- index %>% spread(sentiment,n,fill=0)
# add polarity scorer
index <- index %>% mutate(polarity = positive-negative)
index
Doc 4 and 6 are negative,5 neutrel and rest positive which is actually the case
Upvotes: 0
Reputation: 8846
SentimentAnalysis
has good integration with tm
.
library(tm)
library(SentimentAnalysis)
documents <- c("Wow, I really like the new light sabers!",
"That book was excellent.",
"R is a fantastic language.",
"The service in this restaurant was miserable.",
"This is neither positive or negative.",
"The waiter forget about my dessert -- what poor service!")
vc <- VCorpus(VectorSource(documents))
dtm <- DocumentTermMatrix(vc)
analyzeSentiment(dtm,
rules=list(
"SentimentLM"=list(
ruleSentiment, loadDictionaryLM()
),
"SentimentQDAP"=list(
ruleSentiment, loadDictionaryQDAP()
)
)
)
# SentimentLM SentimentQDAP
# 1 0.000 0.1428571
# 2 0.000 0.0000000
# 3 0.000 0.0000000
# 4 0.000 0.0000000
# 5 0.000 0.0000000
# 6 -0.125 -0.2500000
Upvotes: 1