Dooms31
Dooms31

Reputation: 15

Workflow Tidymodels Formula Object

I am fairly new using R and I am following a guide and learning to build an Expected Goals Model for my hockey league. When I run the code below, I get the error at the bottom. Is there something simple that I am missing?

Seems like its trying to use a formula in the model portion of the workflow but I already have a recipe in there. Thanks in advance for any help anyone can offer me! The guide is here https://www.thesignificantgame.com/portfolio/expected-goals-model-with-tidymodels/

library(tidymodels)
library(tidyverse)
library(dplyr)

set.seed(1972)
train_test_split <- initial_split(data = EXPECTED_GOALS_MODEL, prop = 0.80)
train_data <- train_test_split %>% training() 
test_data  <- train_test_split %>% testing()
    
xg_recipe <- recipe(Goal ~ DistanceC + Angle + Home + Hand + AgeDec31 + GoalieAgeDec31 + NewX + NewY, data = train_data) %>% update_role(NewX, NewY, new_role = "ID")
    
model <- logistic_reg() %>% set_engine("glm")
    
xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)

xg_wflow
    
xg_fit <- xg_wflow %>% fit(data = train_data)

Error in validObject(.Object) : 
  invalid class “model” object: invalid object for slot "formula" in class "model": got class "workflow", should be or extend class "formula"
In addition: Warning message:
In fit(., data = train_data) :
  fit failed: Error in as.matrix(y) : argument "y" is missing, with no default
 fit(x = ., data = train_data) 

Upvotes: 0

Views: 236

Answers (1)

Simon Couch
Simon Couch

Reputation: 531

It's difficult to tell exactly what the issue is without a reproducible example, though this error brings up a few questions up for me:

  • Does the EXPECTED_GOALS_MODEL data indeed have a column called Goal in it, with two unique levels? Have you also spelled the remainder of the column names correctly?
  • Are your tidymodels package installs up to date?
  • Does this error persist if you run specifically generics::fit(data = train_data) instead of fit(data = train_data)? This almost looks like a different fit() is being dispatched to.

Here's a place to start with a reprex:

library(tidymodels)
data(ames)

set.seed(1972)
ames <- ames %>% rowid_to_column()
train_test_split <- initial_split(data = ames, prop = 0.80)
train_data <- train_test_split %>% training() 
test_data  <- train_test_split %>% testing()

xg_recipe <- recipe(Sale_Price ~ ., data = train_data) %>% update_role(rowid, new_role = "ID")

model <- linear_reg() %>% set_engine("glm")

xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)

xg_fit <- xg_wflow %>% fit(data = train_data)

xg_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:  stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)
#> 
#> Coefficients:
#>                                          (Intercept)  
#>                                           -2.583e+07  
#>                  MS_SubClassOne_Story_1945_and_Older  
#>                                            7.419e+03  
#>    MS_SubClassOne_Story_with_Finished_Attic_All_Ages  
#>                                            1.562e+04  
#>    MS_SubClassOne_and_Half_Story_Unfinished_All_Ages  
#>                                            1.060e+04  
#>      MS_SubClassOne_and_Half_Story_Finished_All_Ages  
#>                                            8.413e+03  
#>                  MS_SubClassTwo_Story_1946_and_Newer  
#>                                            3.007e+03  
#>                  MS_SubClassTwo_Story_1945_and_Older  
#>                                            1.793e+04  
#>               MS_SubClassTwo_and_Half_Story_All_Ages  
#>                                           -3.909e+03  
#>                       MS_SubClassSplit_or_Multilevel  
#>                                           -1.098e+04  
#>                               MS_SubClassSplit_Foyer  
#>                                           -4.038e+03  
#>                MS_SubClassDuplex_All_Styles_and_Ages  
#>                                           -2.004e+04  
#>              MS_SubClassOne_Story_PUD_1946_and_Newer  
#>                                           -2.335e+04  
#>           MS_SubClassOne_and_Half_Story_PUD_All_Ages  
#>                                           -2.482e+04  
#>              MS_SubClassTwo_Story_PUD_1946_and_Newer  
#>                                           -1.794e+04  
#>          MS_SubClassPUD_Multilevel_Split_Level_Foyer  
#>                                           -2.098e+04  
#> MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  
#>                                            6.903e+03  
#>                    MS_ZoningResidential_High_Density  
#>                                           -3.853e+03  
#>                     MS_ZoningResidential_Low_Density  
#>                                           -3.661e+03  
#>                  MS_ZoningResidential_Medium_Density  
#>                                           -8.240e+03  
#>                                       MS_ZoningA_agr  
#>                                           -3.824e+03  
#>                                       MS_ZoningC_all  
#>                                           -1.800e+04  
#>                                       MS_ZoningI_all  
#>                                           -3.299e+04  
#>                                         Lot_Frontage  
#>                                            1.336e+01  
#> 
#> ...
#> and 506 more lines.

Created on 2022-09-28 by the reprex package (v2.0.1)

Hope this helps!

Simon, tidymodels team

Upvotes: 1

Related Questions