Reputation: 63
I only have R available at work and I have done this before in Python. I need to get a count of each set of incidents in a CSV file. I have done a sentiment analysis in Python, where I had a dictionary Python searched in a provided a table with the count for each phrase. I am researching how to do this in R and have only found ways to do a general word count using a predetermined frequency.
Please let me know if anyone has any resource links on how to perform this in R. Thank you :)
Upvotes: 2
Views: 5986
Reputation: 24079
The package tidytext is a good solution. Another option is to use the text mining package tm
:
library(tm)
df<-read.csv(myfile)
corpus<-Corpus(VectorSource(df$text))
corpus<-tm_map(corpus, content_transformer(tolower))
corpus<-tm_map(corpus, removeNumbers)
corpus<-tm_map(corpus, removeWords, stopwords('english'))
#corpus<-tm_map(corpus, stemDocument, language = "english")
corpus<-tm_map(corpus, removePunctuation)
tdm<-TermDocumentMatrix(corpus)
tdmatrix<-as.matrix(tdm)
wordfreq<-sort(rowSums(tdmatrix), decreasing = TRUE)
the code example cleans up the text by removing stop words, any numbers and punctuation. The final answer wordfreq
is ready for with the wordcloud package if interested.
Upvotes: 3
Reputation: 1430
Here's a place to start: http://tidytextmining.com
library(tidytext)
text_df %>%
unnest_tokens(word, text)
library(tidytext)
tidy_books <- original_books %>%
unnest_tokens(word, text)
tidy_books
tidy_books %>%
count(word, sort = TRUE)
Upvotes: 4