maldini425
maldini425

Reputation: 317

Random Forest Error in na.fail.default: missing values in object

I am running an RF model, which runs with no errors with most variables; however, when I include one variable: duration_in_program, and the following code:

```{r Random Forest Model}
## Run a Random Forest model
mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry
  )
mod_rf

I get the following error:

Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : missing values in object

Upvotes: 0

Views: 2631

Answers (1)

tim
tim

Reputation: 901

Assuming train() is from caret, you can specify a function to handle na's with the na.action parameter. The default is na.fail. A very common one is na.omit. The randomForest library has na.roughfix that will "Impute Missing Values by median/mode."

mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry,
        na.action = na.omit
  )
mod_rf

Upvotes: 1

Related Questions