Kyle Dixon
Kyle Dixon

Reputation: 305

caret::predict giving Error: $ operator is invalid for atomic vectors

This has been driving me crazy and I've been looking through similar posts all day but can't seem to solve my problem. I have a naive bayes model trained and stored as model. I'm attempting to predict with a newdata data frame but I keep getting the error Error: $ operator is invalid for atomic vectors. Here is what I am running: stats::predict(model, newdata = newdata) where newdata is the first row of another data frame: new data <- pbp[1, c("balls", "strikes", "outs_when_up", "stand", "pitcher", "p_throws", "inning")]

class(newdata) gives [1] "tbl_df" "tbl" "data.frame".

Upvotes: 1

Views: 1274

Answers (1)

akrun
akrun

Reputation: 887501

The issue is with the data used. it should match the levels used in the training. E.g. if we use one of the rows from trainingData to predict, it does work

predict(model, head(model$trainingData, 1))
#[1] Curveball
#Levels: Changeup Curveball Fastball Sinker Slider

By checking the str of both datasets, some of the factor columns in the training is character class

str(model$trainingData)
'data.frame':   1277525 obs. of  7 variables:
 $ pitcher     : Factor w/ 1390 levels "112526","115629",..: 277 277 277 277 277 277 277 277 277 277 ...
 $ stand       : Factor w/ 2 levels "L","R": 1 1 2 2 2 2 2 1 1 1 ...
 $ p_throws    : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
 $ balls       : num  0 1 0 1 2 2 2 0 0 0 ...
 $ strikes     : num  0 0 0 0 0 1 2 0 1 2 ...
 $ outs_when_up: num  1 1 1 1 1 1 1 2 2 2 ...
 $ .outcome    : Factor w/ 5 levels "Changeup","Curveball",..: 3 4 1 4 1 5 5 1 1 5 ...

str(newdata)
tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
 $ balls       : int 3
 $ strikes     : int 2
 $ outs_when_up: int 1
 $ stand       : chr "R"
 $ pitcher     : int 605200
 $ p_throws    : chr "R"

An option is to make levels same for factor class

nm1 <- intersect(names(model$trainingData), names(newdata))
nm2 <- names(which(sapply(model$trainingData[nm1], is.factor)))
newdata[nm2] <- Map(function(x, y) factor(x, levels = levels(y)), newdata[nm2], model$trainingData[nm2])

Now do the prediction

predict(model, newdata)
#[1] Sinker
#Levels: Changeup Curveball Fastball Sinker Slider

Upvotes: 1

Related Questions