Reputation: 3437
How to complete words after stemming in R?
x <- c("completed","complete","completion","teach","taught")
tm <- Corpus(VectorSource(x))
tm <- tm_map(tm, stemDocument)
inspect(tm)
Example for illustration purpose as the actual text corpus is much bigger.
I've searched for earlier examples which points to creating a set of synonyms, but for large corpus, how is it possible to get such as synonym dictionary? For verbs how can I complete stemmed words to current tense? Thanks
Upvotes: 1
Views: 2006
Reputation: 5966
TM has a function stemCompletion()
x <- c("completed","complete","completion","teach","taught")
tm <- Corpus(VectorSource(x))
tm <- tm_map(tm, stemDocument)
inspect(tm)
dictCorpus <- tm
tm <- tm_map(tm, stemDocument)
tm <- tm_map(tm, stripWhitespace, mc.cores=cores)
tm<-tm_map(tm, stemCompletion,dictionary=dictCorpus)
As for completing verbs to the present tense, I am not sure that is possible with tm. Maybe RWeka, word2vec or qdap will have methods but I am not sure.
A quick and dirty, solution may be to set type = shortest
in stemDocument
generally I think current tense words will be shorter than past tense and gerunds.
Upvotes: 2