Remove specific word from a dfm

Question

From this process

    library(stm)
library(tidyr)
library(quanteda)
     testDfm <- gadarian$open.ended.response %>%
             tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
             dfm()

Let's say that we check the frq

dftextstat <- textstat_frequency(testDfm)

and we want to remove some specific words from dfm. Accroding to the dftextstat we want to remove c("and", "to") Is there any way to make it in the dfm without the need to run again the lines to create the dfm?

phiver · Accepted Answer

If you already have a dfm, you can use dfm_remove to remove features.

Based on your example:

# remove "and" and "to"
testDfm <- dfm_remove(testDfm, c("and", "to"))

Better to remove all the stopwords with:

dfm_remove(testDfm, stopwords("english"))

If you still have a tokens object, you can use tokens_remove in the same manner, or in the pipeline you have like above.

Remove specific word from a dfm

Answers (1)

Related Questions