Reputation: 553
I have a language variable in my dataset that looks similar to this (keep in mind there are a lot more languages than shown below):
> dput(dt$LanguageDSC)
c("English", "English", "English", "Portuguese", "English", "English",
"English", "English", "English", "Mandarin", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "Spanish", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "Spanish", "Spanish", "English", "English", "English",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English", "Arabic", "Spanish",
"English", "English", "English", "English", "English", "English",
"English", "English", "English", "English")
Since my dataset has around 30 different languages, I want to collapse some of the language variables. I want the following categories:
English
Spanish
Cantonese
Mandarin
Vietnamese
Other (all other languages)
So far I have this, but it only classifies 'English' or 'Other'. How can I modify this to include the other 4 languages that I included above?
setDT(dt)[!(LanguageDSC == "English"), LanguageDSC := "Other"]
Upvotes: 1
Views: 40
Reputation: 887501
We may use %in%
with !
to select multiple languages
library(data.table)
slt_langs <- c("English", "Spanish", "Cantonese",
"Mandarin", "Vietnamese")
setDT(dt)[!(LanguageDSC %in% slt_langs),
LanguageDSC := "Other"]
Upvotes: 0