Statuser123
Statuser123

Reputation: 83

How to solve Error in predict() : subscript out of bounds in R when doing binary classification?

I am trying to calculate test metrics using binary classification in R. I think I have the correct code, however I keep running into an error.

The goal is to create a classifier that helps detect diabetes given the other variables by training a decision tree model. Create a classifier using a probability cutoff of 0.50 for the positive class.

The following is my code:

# load packages
library("mlbench")
library("tibble")
library("rpart")

# set seed 
set.seed(457)

# load data, remove NA rows, coerce to tibble
data("PimaIndiansDiabetes2")
diabetes = as_tibble(na.omit(PimaIndiansDiabetes2))

# split data
dbt_trn_idx = sample(nrow(diabetes), size = 0.8 * nrow(diabetes))
dbt_trn = diabetes[dbt_trn_idx, ]
dbt_tst = diabetes[-dbt_trn_idx, ]

# check data
dbt_trn

# fit models
mod_tree = rpart(diabetes ~ ., dbt_trn)

# get predicted probabilities for "positive" class, always use second alphabetically for +
prob_tree = predict(mod_tree, dbt_trn)[,"glucose"]

# create tibble of results for tree
results = tibble(
  actual   = dbt_tst$diabetes,
  prob_tree  = prob_tree,
)

# evaluate knn with various metrics
tree_eval = evaluate(
  data = results,
  target_col = "actual",
  prediction_cols = "prob_tree",
  positive = "diabetes",
  type = "binomial",
  metrics = list("Accuracy" = TRUE)
  cutoff = 0.5)

tree_eval 

I keep getting error : "Error in predict(mod_tree, dbt_trn)[, "glucose"] : subscript out of bounds"

I am unsure how to fix this. Any help would be great!

Upvotes: 2

Views: 171

Answers (1)

Paul
Paul

Reputation: 9107

The result of the predict call has columns pos and neg. Also, it looks like you meant to predict on the test set.

prob_tree should be defined like this:

prob_tree = predict(mod_tree, dbt_tst)[,"pos"]

Upvotes: 2

Related Questions