Reputation: 57
The last category I create with the function step_num2factor()
creates all levels correctly but the last one. There it fills in an NA.
MWE
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))
looks like this when printed:
# A tibble: 7 x 2
pred target
<dbl> <dbl>
1 0 0
2 1 1
3 2 0
4 3 1
5 4 1
6 5 1
7 8 0
Doing the recipe steps and comparing results
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))
my_levels <- c("zero", "one", "two", "three", "four", "five", "eight")
recipe(target ~ pred, data = test) %>%
step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>%
prep(training = test) %>%
bake(new_data = test)
Remark: transform because of the level 0 which a factor cannot have. (source)
Transformed dataset after prepping and baking
# A tibble: 7 x 2
pred target
<fct> <dbl>
1 zero 0
2 one 1
3 two 0
4 three 1
5 four 1
6 five 1
7 NA 0
The NA is not supposed to be there. it is supposed to be category "eight". What am I doing wrong?
Remark: I tried it with "six" as well, as I thought maybe the function only accepts the values in words and not completely randomly named levels, but that wasn't it either.
Upvotes: 0
Views: 83
Reputation: 3185
You need to make sure that your input, levels, and transform
match up perfectly.
You were correct in using transform = function(x) x + 1
since you are trying to capture 0
. So when your input is n
then the n+1
th value of levels
is selected.
When your input is 8
then step_num2factor()
returns the 8+1=9
th value of levels
which isn't there since it is only of length 7
, resulting in the NA
you see. The code below should illustrate the issue
library(recipes)
my_levels <- c("zero", "one", "two", "three", "four", "five", "eight")
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 6), target = c(0,1,0,1,1,1,0))
recipe(target ~ pred, data = test) %>%
step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>%
prep() %>%
bake(new_data = NULL)
#> # A tibble: 7 x 2
#> pred target
#> <fct> <dbl>
#> 1 zero 0
#> 2 one 1
#> 3 two 0
#> 4 three 1
#> 5 four 1
#> 6 five 1
#> 7 eight 0
To fix your problem, you need to make sure that there are no gaps in my_levels
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))
my_levels <- c("zero", "one", "two", "three", "four", "five",
"six", "seven", "eight", "nine", "ten")
recipe(target ~ pred, data = test) %>%
step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>%
prep() %>%
bake(new_data = NULL)
#> # A tibble: 7 x 2
#> pred target
#> <fct> <dbl>
#> 1 zero 0
#> 2 one 1
#> 3 two 0
#> 4 three 1
#> 5 four 1
#> 6 five 1
#> 7 eight 0
Created on 2021-03-27 by the reprex package (v0.3.0)
Upvotes: 2