Reputation: 83
I am trying to calculate test metrics using binary classification in R. I think I have the correct code, however I keep running into an error.
The goal is to create a classifier that helps detect diabetes given the other variables by training a decision tree model. Create a classifier using a probability cutoff of 0.50 for the positive class.
The following is my code:
# load packages
library("mlbench")
library("tibble")
library("rpart")
# set seed
set.seed(457)
# load data, remove NA rows, coerce to tibble
data("PimaIndiansDiabetes2")
diabetes = as_tibble(na.omit(PimaIndiansDiabetes2))
# split data
dbt_trn_idx = sample(nrow(diabetes), size = 0.8 * nrow(diabetes))
dbt_trn = diabetes[dbt_trn_idx, ]
dbt_tst = diabetes[-dbt_trn_idx, ]
# check data
dbt_trn
# fit models
mod_tree = rpart(diabetes ~ ., dbt_trn)
# get predicted probabilities for "positive" class, always use second alphabetically for +
prob_tree = predict(mod_tree, dbt_trn)[,"glucose"]
# create tibble of results for tree
results = tibble(
actual = dbt_tst$diabetes,
prob_tree = prob_tree,
)
# evaluate knn with various metrics
tree_eval = evaluate(
data = results,
target_col = "actual",
prediction_cols = "prob_tree",
positive = "diabetes",
type = "binomial",
metrics = list("Accuracy" = TRUE)
cutoff = 0.5)
tree_eval
I keep getting error : "Error in predict(mod_tree, dbt_trn)[, "glucose"] : subscript out of bounds"
I am unsure how to fix this. Any help would be great!
Upvotes: 2
Views: 171
Reputation: 9107
The result of the predict
call has columns pos
and neg
. Also, it looks like you meant to predict on the test set.
prob_tree
should be defined like this:
prob_tree = predict(mod_tree, dbt_tst)[,"pos"]
Upvotes: 2