Why are stopwords not filtered out in `tm` corporized term-document matrices?

Question

I'm building a term-document matrix using the tm library.

# Create corpus.
corporize <- function(dir_to_corporize)
{
    crp <- Corpus(DirSource(dir_to_corporize, mode="text", encoding="ASCII"),
                 readerControl=list(reader=readPlain, language="en_EN"))
    crp <- tm_map(crp, removeWords, stopwords("english"))
    crp <- tm_map(crp, removePunctuation, preserve_intra_word_dashes=F)
    crp <- tm_map(crp, removeNumbers)
    crp <- tm_map(crp, stripWhitespace)
    crp <- tm_map(crp, content_transformer(tolower))
}

However, when I check my term-document matrix, I find a couple of stopwords remained:

the last time i saw
we need talk about kevin
you make me feel like

Why is that and what can I do?

Why are stopwords not filtered out in `tm` corporized term-document matrices?

Answers (1)

Related Questions