Reputation: 2653
I am trying to use various prediction algorithms from the Caret package in R for regression problem that is my target variable is continuous. Caret thinks classification is the appropriate class of the problem and when I pass any of the regression models, I get an error message that says "wrong model type for classification". For reproducibility, let's see with the Combined Cycle Power Plant Data Set. The data is in CCPP.zip. Let's predict power as a function of the other variables. Power is a continuous variable.
library(readxl)
library(caret)
power_plant = read_excel("Folds5x2_pp.xlsx")
apply(power_plant,2, class) # shows all columns are numeric
control <- trainControl(method="repeatedcv", number=10, repeats=5)
my_glm <- train(power_plant[,1:4], power_plant[,5],
method = "lm",
preProc = c("center", "scale"),
trControl = control)
The image below is my screenshot:
Upvotes: 4
Views: 11853
Reputation: 51
I get a similar error when I try to use formula = y ~ x, works great just omitting the named variable and using y ~ x.
Upvotes: 0
Reputation: 22847
For some reason caret
gets confused by tibbles, which is the tidyverse variant of a data frame that read_excel
returns. By converting it to a simple data frame before giving it to caret, everything works:
library(readxl)
library(caret)
power_plant = read_excel("Folds5x2_pp.xlsx")
apply(power_plant,2, class) # shows all columns are numeric
power_plant <- data.frame(power_plant)
control <- trainControl(method="repeatedcv", number=10, repeats=5)
my_glm <- train(power_plant[,1:4], power_plant[,5],
method = "lm",
preProc = c("center", "scale"),
trControl = control)
my_glm
yielding:
Linear Regression
9568 samples
4 predictor
Pre-processing: centered (4), scaled (4)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 8612, 8612, 8611, 8612, 8612, 8610, ...
Resampling results:
RMSE Rsquared
4.556703 0.9287933
Tuning parameter 'intercept' was held constant at a value of TRUE
Upvotes: 2