Reputation: 187
From this process
library(stm)
library(tidyr)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
dfm()
Let's say that we check the frq
dftextstat <- textstat_frequency(testDfm)
and we want to remove some specific words from dfm. Accroding to the dftextstat we want to remove c("and", "to")
Is there any way to make it in the dfm without the need to run again the lines to create the dfm?
Upvotes: 0
Views: 1525
Reputation: 23608
If you already have a dfm, you can use dfm_remove
to remove features.
Based on your example:
# remove "and" and "to"
testDfm <- dfm_remove(testDfm, c("and", "to"))
Better to remove all the stopwords with:
dfm_remove(testDfm, stopwords("english"))
If you still have a tokens object, you can use tokens_remove
in the same manner, or in the pipeline you have like above.
Upvotes: 2