rek
rek

Reputation: 187

Remove specific word from a dfm

From this process

    library(stm)
library(tidyr)
library(quanteda)
     testDfm <- gadarian$open.ended.response %>%
             tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
             dfm()

Let's say that we check the frq

dftextstat <- textstat_frequency(testDfm)

and we want to remove some specific words from dfm. Accroding to the dftextstat we want to remove c("and", "to") Is there any way to make it in the dfm without the need to run again the lines to create the dfm?

Upvotes: 0

Views: 1525

Answers (1)

phiver
phiver

Reputation: 23608

If you already have a dfm, you can use dfm_remove to remove features.

Based on your example:

# remove "and" and "to"
testDfm <- dfm_remove(testDfm, c("and", "to"))

Better to remove all the stopwords with:

dfm_remove(testDfm, stopwords("english"))

If you still have a tokens object, you can use tokens_remove in the same manner, or in the pipeline you have like above.

Upvotes: 2

Related Questions