Reputation: 23
I recently picked up Tidymodels after having used R for a few months in my school.
I was trying to make my first model using the Titanic Dataset on Kaggle, but ran into some issues when fitting the model. Could someone help me?
titanic_rec <- recipe(Survived ~ Sex + Age + Pclass + Embarked + Family_Size + Name, data = titanic_train) %>%
step_impute_knn(all_predictors(), k = 3) %>%
step_dummy(Sex, Pclass, Embarked, Family_Size, Name) %>%
step_interact(~ Sex:Age + Sex:Pclass + Pclass:Age)
log_model <- logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification")
fitted_log_model <- workflow() %>%
add_model(log_model) %>%
add_recipe(titanic_rec) %>%
fit(data = titanic_train) %>%
pull_workflow_fit() %>%
tidy()
Every feature has a factor data type except Age and Survived which are doubles. The error seems to come about when I include the fit(data = ...) onwards.
Error: Can't rename variables in this context. Run `rlang::last_error()` to see where the error occurred.
24.
stop(fallback)
23.
signal_abort(cnd)
22.
abort("Can't rename variables in this context.")
21.
eval_select_recipes(to_impute, training, info)
20.
impute_var_lists(to_impute = x$terms, impute_using = x$impute_with, training = training, info = info)
19.
prep.step_impute_knn(x$steps[[i]], training = training, info = x$term_info)
18.
prep(x$steps[[i]], training = training, info = x$term_info)
17.
prep.recipe(blueprint$recipe, training = data, fresh = blueprint$fresh)
16.
recipes::prep(blueprint$recipe, training = data, fresh = blueprint$fresh)
15.
blueprint$mold$process(blueprint = blueprint, data = data)
14.
run_mold.recipe_blueprint(blueprint, data)
13.
run_mold(blueprint, data)
12.
mold.recipe(recipe, data, blueprint = blueprint)
11.
hardhat::mold(recipe, data, blueprint = blueprint)
10.
fit.action_recipe(action, workflow = workflow, data = data)
9.
fit(action, workflow = workflow, data = data)
8.
.fit_pre(workflow, data)
7.
fit.workflow(., data = titanic_train)
6.
fit(., data = titanic_train)
5.
is_workflow(x)
4.
validate_is_workflow(x)
3.
pull_workflow_fit(.)
2.
tidy(.)
1.
workflow() %>% add_model(log_model) %>% add_recipe(titanic_rec) %>% fit(data = titanic_train) %>% pull_workflow_fit() %>% tidy()
Upvotes: 2
Views: 2038
Reputation: 3185
The posted error comes from step_impute_knn()
where the number of neighbors should be specified by with neighbors
. Secondly, I would advise against using name
as a predictor since it creates a separate dummy variable for each name which would mess with the fit.
The final error comes in step_interact()
. You can't use step_interact(~ Sex:Age)
after step_dummy(Sex)
becuase there won't be any columns named Sex
after step_dummy()
is done. Instead it will have Sex_male
(since female is part of the intercept). A way to catch all the created dummy variables is to use starts_with()
inside step_interact()
.
library(tidymodels)
titanic_train <- readr::read_csv("your/path/to/data/train.csv")
titanic_train <- titanic_train %>%
mutate(Survived = factor(Survived),
Pclass = factor(Pclass),
Family_Size = SibSp + Parch + 1)
titanic_rec <- recipe(Survived ~ Sex + Age + Pclass + Embarked + Family_Size,
data = titanic_train) %>%
step_impute_knn(all_predictors(), neighbors = 3) %>%
step_dummy(Sex, Pclass, Embarked) %>%
step_interact(~ starts_with("Sex_"):Age +
starts_with("Sex_"):starts_with("Pclass_") +
starts_with("Pclass_"):Age)
log_model <- logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification")
fitted_log_model <- workflow() %>%
add_model(log_model) %>%
add_recipe(titanic_rec) %>%
fit(data = titanic_train) %>%
pull_workflow_fit() %>%
tidy()
fitted_log_model
#> # A tibble: 13 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 3.85 0.921 4.18 0.0000289
#> 2 Age 0.0117 0.0226 0.516 0.606
#> 3 Family_Size -0.226 0.0671 -3.36 0.000769
#> 4 Sex_male -2.22 0.886 -2.50 0.0124
#> 5 Pclass_X2 1.53 1.16 1.31 0.189
#> 6 Pclass_X3 -2.42 0.884 -2.74 0.00615
#> 7 Embarked_Q -0.0461 0.368 -0.125 0.900
#> 8 Embarked_S -0.548 0.243 -2.26 0.0241
#> 9 Sex_male_x_Age -0.0488 0.0199 -2.46 0.0140
#> 10 Sex_male_x_Pclass_X2 -1.28 0.879 -1.46 0.144
#> 11 Sex_male_x_Pclass_X3 1.48 0.699 2.11 0.0347
#> 12 Age_x_Pclass_X2 -0.0708 0.0263 -2.69 0.00714
#> 13 Age_x_Pclass_X3 -0.0341 0.0209 -1.63 0.103
Created on 2021-07-01 by the reprex package (v2.0.0)
Upvotes: 1