Jackie
Jackie

Reputation: 63

Text Analysis in R - word frequency

I only have R available at work and I have done this before in Python. I need to get a count of each set of incidents in a CSV file. I have done a sentiment analysis in Python, where I had a dictionary Python searched in a provided a table with the count for each phrase. I am researching how to do this in R and have only found ways to do a general word count using a predetermined frequency.

Please let me know if anyone has any resource links on how to perform this in R. Thank you :)

Upvotes: 2

Views: 5986

Answers (2)

Dave2e
Dave2e

Reputation: 24079

The package tidytext is a good solution. Another option is to use the text mining package tm:

library(tm)
df<-read.csv(myfile)

corpus<-Corpus(VectorSource(df$text))
corpus<-tm_map(corpus, content_transformer(tolower))
corpus<-tm_map(corpus, removeNumbers)
corpus<-tm_map(corpus, removeWords, stopwords('english'))
#corpus<-tm_map(corpus, stemDocument, language = "english") 
corpus<-tm_map(corpus, removePunctuation)

tdm<-TermDocumentMatrix(corpus)

tdmatrix<-as.matrix(tdm)
wordfreq<-sort(rowSums(tdmatrix), decreasing = TRUE)

the code example cleans up the text by removing stop words, any numbers and punctuation. The final answer wordfreq is ready for with the wordcloud package if interested.

Upvotes: 3

Ryan John
Ryan John

Reputation: 1430

Here's a place to start: http://tidytextmining.com

library(tidytext)

text_df %>%
  unnest_tokens(word, text)

library(tidytext)
tidy_books <- original_books %>%
  unnest_tokens(word, text)

tidy_books

tidy_books %>%
  count(word, sort = TRUE) 

Upvotes: 4

Related Questions