Reputation: 2085
library(dplyr)
library(caret)
library(doParallel)
cl <- makeCluster(3, outfile = '')
registerDoParallel(cl)
set.seed(2019)
fit1 <- train(x = X_train %>% head(1000) %>% as.matrix(),
y = y_train %>% head(1000),
method = 'ranger',
verbose = TRUE,
trControl = trainControl(method = 'oob',
verboseIter = TRUE,
allowParallel = TRUE,
classProbs = TRUE),
tuneGrid = expand.grid(mtry = 2:3,
min.node.size = 1,
splitrule = 'gini'),
num.tree = 100,
metric = 'Accuracy',
importance = 'permutation')
stopCluster(cl)
The code above results in the error:
Aggregating results Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :2 NA's :2
ERROR: Stopping
I've already searched SO for this error and found out that there are many possible reason behind it. Unfortunetely, I didn't find anything applicable to my case. Here, the issue seems to be with classProbs = TRUE
- when I remove this and default value of FALSE
is used model is trained succesfully. However, I don't get why it may be a problem as according to documentation:
a logical; should class probabilities be computed for classification models (along with predicted values) in each resample?
Data sample:
X_train <- structure(list(V5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V1 = c(41.5,
5.3, 44.9, 58.7, 67.9, 56.9, 3.7, 43.4, 38.6, 34.2, 42.3, 29.1,
27.6, 44.2, 55.6, 53.7, 48, 58.4, 54, 7.1, 35.9, 36, 61.2, 24.1,
20.3, 10.8, 13, 69.4, 71.5, 45.6, 34.4, 17.1, 30.1, 68.9, 25.1,
37.4, 55.5, 58.9, 49.8, 47.2, 29.5, 19.9, 24.1, 27, 33.3, 41.9,
33.2, 27.9, 48.4, 41.2), V2 = c(33.1, 35.4, 66.2, 1.8, 5, -0.9,
32.8, 35.8, 36, 4, 65.5, 64, 61, 68.9, 69.3, 59.7, 29.8, 24.4,
62.7, 12.2, 6, -1.2, 63.5, 7.5, 22.9, 40.5, 47.3, 1.6, -1.5,
33.3, 53.3, 23.7, 2.7, 61, 2.4, 13.5, 8.1, 55.1, 29.6, 36.8,
26.8, 26, 30.8, 53.8, 10.6, 1.9, 10.2, 29.1, 51.4, 33.1), V3 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), V4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -50L))
y_train <- structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("plus", "minus"), class = "factor")
Upvotes: 0
Views: 579
Reputation: 263301
Based on the responses to https://stats.stackexchange.com/questions/23763/is-there-a-way-to-disable-the-parameter-tuning-grid-feature-in-caret I tried following the advice to set the trainControl
"method" to "none which allowed successful execution. The second answer answer implied that random forest methods should not use complicated grids. (I also set the 'mtry' parameter to a single value, but I'm not sure that was necessary.) (I had earlier attempted to remove the use of parallel clusters without any effect on the errors.) You can add back features now that you have code that doesn't throw errors.
fit1 <- train(form=y~., x = X_train[,2:3] ,
y = factor(y_train) ,
method = 'ranger',
verbose = TRUE,
trControl=trainControl(method="none"),
tuneGrid = expand.grid(mtry = 2,
min.node.size = 1,
splitrule = 'gini'
),
num.tree = 100,
metric = 'Accuracy',
importance = 'permutation')
Upvotes: 1