Georgery
Georgery

Reputation: 8117

Get Factor Levels of other Vector (and lump non-existent ones)

I got two factors

foo_1 <- factor(c("a", "b", "c", "Other"))
foo_2 <- factor(c("a", "b", "x"))

I want to recode foo_2 so that

  1. the levels are the same as in foo_1
  2. levels that do not exist in levels(foo_2) ("x") are recoded to the "Other"-level.

So, something like

bar(foo_2, foo_1)

[1] a     b     Other
Levels: a b c Other

Background

I am building randomForest()s and there can be levels in the prediction data that do not exist in the development data and the prediction is not possible, which is very annoying. (foo_1 is the vector from the development data and foo_2 is the one from the prediction data.) I would make a bet that others must have had the same problem before and that the answer should be out there, but I couldn't find it.

I would love a solution using the forcats package, but other ways are also highly welcome.

Thanks in advance.

Upvotes: 0

Views: 64

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388907

A simple way would be :

foo_3 <- factor(foo_2, levels = levels(foo_1))
foo_3[is.na(foo_3)] <- 'Other'
foo_3
#[1] a     b     Other
#Levels: a b c Other

In forcats I could find fct_other which does exactly that but it does not keep the missing levels. (c) so you have to add it later.

library(forcats)
foo_3 <- fct_other(foo_2, levels(foo_1))
foo_3 <- fct_expand(foo_3, levels(foo_1))

Upvotes: 1

Georgery
Georgery

Reputation: 8117

Expanding on Ronaks answer and making it a little more elegant using the magrittr pipe (%>%):

library (forcats)

foo_2 %>% fct_expand(levels(foo_1)) %>% fct_other(levels(foo_1))

[1] a     b     Other
Levels: a b c Other

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101247

What about something like below

> replace(u <- foo_1[match(levels(foo_2),levels(foo_1))],is.na(u),"Other")
[1] a     b     Other
Levels: a b c Other

Upvotes: 1

Related Questions