Pavan
Pavan

Reputation: 71

POS Tagging & Theme/Pattern Detection in R

I am new to R and exploring Text Mining. Using the below steps I could get through till stemming however, I would need to do POS tagging and get Text/Theme Pattern. The data that I am using is the customer verbatim. Please help how to proceed further. Most of the articles that I checked do not explain how to do POS tagging for the data in Corpus and I could not find any details on Pattern detection. Any help would be greatly appreciated...! Thanks in advance,

CSVfile = read.csv("Testfortextcsv.csv",stringsAsFactors = FALSE)
TestSplit = as.data.frame(sent_detect_nlp(CSVfile$Comment))
colnames(TestSplit)[colnames(TestSplit)=="sent_detect_nlp(CSVfile$Comment)"]<- "Comment"
TestCorpus = Corpus(VectorSource(TestSplit$Comment))
TestCorpus = tm_map(TestCorpus, tolower)
TestCorpus = tm_map(TestCorpus, PlainTextDocument)
TestCorpus = tm_map(TestCorpus, removePunctuation)
TestCorpus = tm_map(TestCorpus, removeWords,c("Test",stopwords("SMART"),stopwords("english")))
TestCorpus = tm_map(TestCorpus, stripWhitespace)
TestCorpus = tm_map(TestCorpus, stemDocument)
dtm <- TermDocumentMatrix(TestCorpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

This I used for getting wordcloud, association and a Barplot.


WordCloud
----------
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,max.words=200,random.order=FALSE, rot.per=0.35, colors=brewer.pal(8,
"Dark2"))

Find Frequent Terms
-----------------
findFreqTerms(dtm, lowfreq = 15)

Find Association:
-----------------------
findAssocs(dtm, terms = "account", corlimit = 0.3)

Bar Plot for frequencies
--------------------------
barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,col ="lightblue", main ="Most frequent words",ylab = "Word frequencies")

Upvotes: 1

Views: 972

Answers (1)

lawyeR
lawyeR

Reputation: 7664

The qdap package allows you to identify the part of speech of each word in a string.:

library(qdap)
s1<-c("Hello World")  
pos(s1)

You might find other resources openNLP and RTextTools and another possibility

Upvotes: 3

Related Questions