santoku
santoku

Reputation: 3437

r text analysis stem completion

How to complete words after stemming in R?

x <- c("completed","complete","completion","teach","taught")
tm <- Corpus(VectorSource(x))
tm <- tm_map(tm, stemDocument)
inspect(tm)

Example for illustration purpose as the actual text corpus is much bigger.

I've searched for earlier examples which points to creating a set of synonyms, but for large corpus, how is it possible to get such as synonym dictionary? For verbs how can I complete stemmed words to current tense? Thanks

Upvotes: 1

Views: 2006

Answers (1)

emilliman5
emilliman5

Reputation: 5966

TM has a function stemCompletion()

x <- c("completed","complete","completion","teach","taught")
tm <- Corpus(VectorSource(x))
tm <- tm_map(tm, stemDocument)
inspect(tm)
dictCorpus <- tm
tm <- tm_map(tm, stemDocument)
tm <- tm_map(tm, stripWhitespace, mc.cores=cores)  

tm<-tm_map(tm, stemCompletion,dictionary=dictCorpus)

As for completing verbs to the present tense, I am not sure that is possible with tm. Maybe RWeka, word2vec or qdap will have methods but I am not sure.

A quick and dirty, solution may be to set type = shortest in stemDocument generally I think current tense words will be shorter than past tense and gerunds.

Upvotes: 2

Related Questions