Reputation: 51
I'm trying to apply stacking on my dataset but here I am.
# Load library
library(DJL)
library(caret)
library(caretEnsemble)
# Load data and format the target attribute to avoid clutters
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")
# Run
st.methods <- c("lda", "rpart", "glm", "knn", "svmRadial")
st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3,
savePredictions = T, classProbs = T)
st.models <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)
Then I get this:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error: Stopping
In addition: There were 18 warnings (use warnings() to see them)
Can anyone help me to fix this error?
Upvotes: 3
Views: 5767
Reputation: 24262
The glm
model cannot be used for predicting categorical dependent variables with more than two categories. Try to delete glm
from st.methods
or substitute glm
with, for example, multinom
, gbm
, randomForest
.
Here are two useful experiment. In the first we consider only glm
:
rm(list=ls())
library(DJL)
library(caret)
library(caretEnsemble)
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")
st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3,
savePredictions = T, classProbs = T)
st.methods <- c("glm")
st.models <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)
Here is the error message:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping
Inoltre: There were 18 warnings (use warnings() to see them)
Now we substitute glm
with multinom
:
st.methods <- c("multinom")
st.models <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)
print(st.models)
The output is:
$multinom
Penalized Multinomial Regression
1206 samples
5 predictor
5 classes: 'NA.D', 'NA.P', 'SC.P', 'TC.D', 'TC.P'
No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times)
Summary of sample sizes: 964, 965, 965, 965, 965, 964, ...
Resampling results across tuning parameters:
decay Accuracy Kappa
0e+00 0.9306411 0.8518294
1e-04 0.9300901 0.8506964
1e-01 0.9328507 0.8564466
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was decay = 0.1.
Upvotes: 3