Reputation: 317
I am running an RF model, which runs with no errors with most variables; however, when I include one variable: duration_in_program, and the following code:
```{r Random Forest Model}
## Run a Random Forest model
mod_rf <-
train(left_school ~ job_title
+ gender +
+ marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
+ cityB +cityA + duration_in_program, # Equation (outcome and everything else)
data=train_data, # Training data
method = "ranger", # random forest (ranger is much faster than rf)
metric = "ROC", # area under the curve
trControl = control_conditions,
tuneGrid = tune_mtry
)
mod_rf
I get the following error:
Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : missing values in object
Upvotes: 0
Views: 2631
Reputation: 901
Assuming train()
is from caret, you can specify a function to handle na's with the na.action
parameter. The default is na.fail
. A very common one is na.omit
. The randomForest library has na.roughfix
that will "Impute Missing Values by median/mode."
mod_rf <-
train(left_school ~ job_title
+ gender +
+ marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
+ cityB +cityA + duration_in_program, # Equation (outcome and everything else)
data=train_data, # Training data
method = "ranger", # random forest (ranger is much faster than rf)
metric = "ROC", # area under the curve
trControl = control_conditions,
tuneGrid = tune_mtry,
na.action = na.omit
)
mod_rf
Upvotes: 1