W.tan
W.tan

Reputation: 35

Comparing Predicted Value with actual value

im new to statistics and R,

Im currently practicing to use GBM model to predict "charges" value from insurance company, with variables of age, bmi, number of children, and smooker. I managed to use the gbm model, but I dont know how to compare the predicted value with the actual value here.

insure<-as.tibble(insurance)
insure<-insure %>% 
  mutate(Agegroup=as.factor(findInterval(age,c(18,35,50,80))))
levels(insure$Agegroup)<-c("Youth","Mid Aged","Old")

#Divide the dataset into a training and validation set for some machine learning predictions
trainds<-createDataPartition(insure$Agegroup,p=0.8,list=F)
validate<-insure[-trainds,] 
trainds<-insure[trainds,]  
#Set metric and control
control<-trainControl(method="cv",number=10)
metric<-"RMSE" 
#Set up models 
set.seed(233)
summary(fit.gbm<-train(charges~.,data=trainds,method="gbm",trControl=control,metric=metric,
               verbose=F) )

I dont know which data should I use to compare? since the model used "trainds" data, should i compare it with validate data? or the actual "insure" data?

This is my attempt

plot(predict(fit.gbm),  #should i use the newdata?                         
     validate$charges, #not sure if i should use "validate$charges" or from other data
     xlab = "Predicted Values",
     ylab = "Observed Values")
abline(a = 0,                                      
       b = 1,
       col = "red",
       lwd = 2)

However, since both data have different length i keep getting error of

'x' and 'y' lengths differ

Upvotes: 0

Views: 329

Answers (0)

Related Questions