Reputation: 8117
I got two factors
foo_1 <- factor(c("a", "b", "c", "Other"))
foo_2 <- factor(c("a", "b", "x"))
I want to recode foo_2
so that
foo_1
levels(foo_2)
("x"
) are recoded to the "Other"
-level.So, something like
bar(foo_2, foo_1)
[1] a b Other
Levels: a b c Other
Background
I am building randomForest()
s and there can be levels in the prediction data that do not exist in the development data and the prediction is not possible, which is very annoying. (foo_1
is the vector from the development data and foo_2
is the one from the prediction data.) I would make a bet that others must have had the same problem before and that the answer should be out there, but I couldn't find it.
I would love a solution using the forcats
package, but other ways are also highly welcome.
Thanks in advance.
Upvotes: 0
Views: 64
Reputation: 388907
A simple way would be :
foo_3 <- factor(foo_2, levels = levels(foo_1))
foo_3[is.na(foo_3)] <- 'Other'
foo_3
#[1] a b Other
#Levels: a b c Other
In forcats
I could find fct_other
which does exactly that but it does not keep the missing levels. (c
) so you have to add it later.
library(forcats)
foo_3 <- fct_other(foo_2, levels(foo_1))
foo_3 <- fct_expand(foo_3, levels(foo_1))
Upvotes: 1
Reputation: 8117
Expanding on Ronaks answer and making it a little more elegant using the magrittr
pipe (%>%
):
library (forcats)
foo_2 %>% fct_expand(levels(foo_1)) %>% fct_other(levels(foo_1))
[1] a b Other
Levels: a b c Other
Upvotes: 0
Reputation: 101247
What about something like below
> replace(u <- foo_1[match(levels(foo_2),levels(foo_1))],is.na(u),"Other")
[1] a b Other
Levels: a b c Other
Upvotes: 1