How to incorporate tidy models PCA into the workflow of a model and make predictions

Question

I am trying to incorporate tidy models PCA into the workflow of a model. I want to have a predictive model that uses PCA as a preprocessing step and then make predictions with that model.

I have tried the following approach,

diamonds <- diamonds %>%
  select(-clarity, -cut, - color)

diamonds_split <- initial_split(diamonds, prop = 4/5)

diamonds_train <- training(diamonds_split)
diamonds_test <- testing(diamonds_split)

diamonds_test <-vfold_cv(diamonds_train)

diamonds_recipe <- 
  # La fórmula básica y todos los datos (outcome ~ predictors)
  recipe(price ~ ., data = diamonds_train) %>%
  step_log(all_outcomes(),skip = T) %>%
  step_normalize(all_predictors(), -all_nominal()) %>% 
  step_pca(all_predictors())

preprocesados <- prep(diamonds_recipe)

linear_model <- 
  linear_reg() %>%
  set_engine("glmnet") %>%
  set_mode("regression")

pca_workflow <- workflow() %>%
  add_recipe(diamonds_recipe) %>%
  add_model(linear_model)

lr_fitted_workflow <-  pca_workflow %>%  #option A workflow full dataset
  last_fit(diamonds_split)

performance <- lr_fitted_workflow %>% collect_metrics()

test_predictions <- lr_fitted_workflow %>% collect_predictions()

But I get this error:

x Resample1: model (predictions): Error: penalty should be a single numeric value. ... Warning message: “All models failed in [fit_resamples()]. See the .notes column.”

Following other tutorials I tried to use this other approach, but I don't know how to use the model to make new predictions, because the new data comes in the original (non-pca) form. So I tried this:

pca_fit <- juice(preprocesados) %>%  #option C no work flow at all
  lm(price ~ ., data = .)

prep_test <- prep(diamonds_recipe, new_data = diamonds_test)

truths <- juice(prep_test) %>%
          select(price)

ans <- predict(pca_fit, new_data = prep_test)

tib <- tibble(row = 1:length(ans),ans, truths)

ggplot(data = tib) +
  geom_smooth(mapping = aes(x = row, y = ans, colour = "predicted")) +
  geom_smooth(mapping = aes(x = row, y = price, colour = "true"))

And it prints something that seams reasonable, but by this point I have lost confidence and some guidance would be much appreciated. :D

How to incorporate tidy models PCA into the workflow of a model and make predictions

Answers (1)

Edit

Related Questions