How to remove common word endings from a non-English corpus using the tm package?

Question

I am trying to do some text mining, using tm package, on reviews that Italian users of a certain website wrote there. I scraped the texts, stored them on a corpus, did some sort of cleaning, but when I try to get the stems of the words by removing the common endings, I have problem specifying the Italian language instead of default one, i.e. English.

reviews_corpus <- tm_map(reviews_corpus, removeNumbers)
reviews_corpus <- tm_map(reviews_corpus, removePunctuation)
reviews_corpus <- tm_map(reviews_corpus, stripWhitespace)
reviews_corpus <- tm_map(reviews_corpus, content_transformer(tolower))
reviews_corpus <- tm_map(reviews_corpus, removeWords, stopwords("italian"))
reviews_corpus <- tm_map(reviews_corpus, stemDocument(reviews_corpus, language="italian"))

First five lines work fine, but for the last one R gives me:

Error in UseMethod("stemDocument", x) : 
  no applicable method for 'stemDocument' applied to an object of class "c('VCorpus', 'Corpus')"

So, my problem is that how can I use stemDocument on a corpus but specify the language I want to be used?

How to remove common word endings from a non-English corpus using the tm package?

Answers (1)

Related Questions