Reputation: 360
I am training a binary classification model with h2o AutoML using the default cross-validation (nfolds=5
). I need to obtain the AUC score for each holdout fold in order to compute the variability.
This is the code I am using:
h2o.init()
prostate = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
# convert columns to factors
prostate['CAPSULE'] = prostate['CAPSULE'].asfactor()
prostate['RACE'] = prostate['RACE'].asfactor()
prostate['DCAPS'] = prostate['DCAPS'].asfactor()
prostate['DPROS'] = prostate['DPROS'].asfactor()
# set the predictor and response columns
predictors = ["AGE", "RACE", "VOL", "GLEASON"]
response_col = "CAPSULE"
# split into train and testing sets
train, test = prostate.split_frame(ratios = [0.8], seed = 1234)
aml = H2OAutoML(seed=1, max_runtime_secs=100, exclude_algos=["DeepLearning", "GLM"],
nfolds=5, keep_cross_validation_predictions=True)
aml.train(predictors, response_col, training_frame=prostate)
leader = aml.leader
I check that leader
is not a StackedEnsamble model (for which the validation metrics are not available). Anyway, I am not able to retrieve the five AUC scores.
Any idea on how to do so?
Upvotes: 3
Views: 1318
Reputation: 15
I submitted the following task https://h2oai.atlassian.net/browse/PUBDEV-8984
This is when you want to order your grid search for a specific metric.
def sort_grid(grid,metric):
#input: grid and metric to order
if metric == 'accuracy':
id = 0
elif metric == 'auc':
id = 1
elif metric=='err':
id = 2
elif metric == 'err_count':
id=3
elif metric=='f0point5':
id=4
elif metric=='f1':
id=5
elif metric =='f2':
id=6
elif metric =='lift_top_group':
id=7
elif metric == 'logloss':
id=8
elif metric == 'max_per_class_error':
id=9
elif metric == 'mcc':
metric=9
elif metric =='mena_per_class_accuracy':
id=10
elif metric == 'mean_per_class_error':
id=11
elif metric == 'mse':
id =12
elif metric == 'pr_auc':
id=13
elif metric == 'precision':
id=14
elif metric == 'r2':
id=15
elif metric =='recall':
id=16
elif metric == 'rmse':
id = 17
elif metric == 'specificity':
id = 18
else:
return 0
model_ids = []
cross_val_values = []
number_of_models = len(grid.model_ids)
number_of_models
for i in range(number_of_models):
modelo_grid = grid[i]
mean = np.array(modelo_grid.cross_validation_metrics_summary()[[1]])
cross_val= mean[0][id]
model_id = grid.model_ids[i]
model_ids.append(model_id)
cross_val_values.append(cross_val)
df = pd.DataFrame(
{'Model_IDs': model_ids, metric: cross_val_values}
)
df = df.sort_values([metric], ascending=False)
best_model = h2o.get_model(df.iloc[0,0])
return df, best_model
#outputs: ordered grid in pandas dataframe and best model
I used this for a binary classification model
Upvotes: 0
Reputation: 8819
Here's how it's done:
import h2o
from h2o.automl import H2OAutoML
h2o.init()
# import prostate dataset
prostate = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
# convert columns to factors
prostate['CAPSULE'] = prostate['CAPSULE'].asfactor()
prostate['RACE'] = prostate['RACE'].asfactor()
prostate['DCAPS'] = prostate['DCAPS'].asfactor()
prostate['DPROS'] = prostate['DPROS'].asfactor()
# set the predictor and response columns
predictors = ["AGE", "RACE", "VOL", "GLEASON"]
response_col = "CAPSULE"
# split into train and testing sets
train, test = prostate.split_frame(ratios = [0.8], seed = 1234)
# run AutoML for 100 seconds
aml = H2OAutoML(seed=1, max_runtime_secs=100, exclude_algos=["DeepLearning", "GLM"],
nfolds=5, keep_cross_validation_predictions=True)
aml.train(x=predictors, y=response_col, training_frame=prostate)
# Get the leader model
leader = aml.leader
There is a caveat to mention here about cross-validated AUC -- H2O currently stores two computations of CV AUC. One is an aggregated version (take the AUC of aggregated CV predictions), and the other is the "true" definition of cross-validated AUC (an average of the k AUCs from k-fold cross-validation). The latter is stored in an object which also contains the individual fold AUCs, as well as the standard deviation across the folds.
If you're wondering why we do this, there's some historical & technical reasons why we have two versions, as well as a ticket open to only every report the latter.
The first one is what you get when you do this (and also what appears on the AutoML Leaderboard).
# print CV AUC for leader model
print(leader.model_performance(xval=True).auc())
If you want the fold-wise AUCs so you can compute or view their mean and variability (standard deviation), you can do that by looking here:
# print CV metrics summary
leader.cross_validation_metrics_summary()
Output:
Cross-Validation Metrics Summary:
mean sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
----------- ---------- ----------- ------------ ------------ ------------ ------------ ------------
accuracy 0.71842104 0.06419111 0.7631579 0.6447368 0.7368421 0.7894737 0.65789473
auc 0.7767409 0.053587236 0.8206676 0.70905924 0.7982079 0.82538515 0.7303846
aucpr 0.6907578 0.0834025 0.78737605 0.7141305 0.7147677 0.67790955 0.55960524
err 0.28157896 0.06419111 0.23684211 0.35526314 0.2631579 0.21052632 0.34210527
err_count 21.4 4.8785243 18.0 27.0 20.0 16.0 26.0
--- --- --- --- --- --- --- ---
precision 0.61751753 0.08747421 0.675 0.5714286 0.61702126 0.7241379 0.5
r2 0.20118153 0.10781976 0.3014902 0.09386432 0.25050205 0.28393403 0.07611712
recall 0.84506994 0.08513061 0.84375 0.9142857 0.9354839 0.7241379 0.8076923
rmse 0.435928 0.028099842 0.41264254 0.47447023 0.42546 0.41106534 0.4560018
specificity 0.62579334 0.15424488 0.70454544 0.41463414 0.6 0.82978725 0.58
See the whole table with table.as_data_frame()
Here's what the leaderboard looks like (storing aggregated CV AUCs). In this case, because the data is so small (300 rows), there's a noticeable difference between the two reported between the two reported CV AUC values, however for larger datasets, they should be much closer estimates.
# print the whole Leaderboard (all CV metrics for all models)
lb = aml.leaderboard
print(lb)
That will print the top of the leaderboard:
model_id auc logloss aucpr mean_per_class_error rmse mse
--------------------------------------------------- -------- --------- -------- ---------------------- -------- --------
XGBoost_grid__1_AutoML_20200924_200634_model_2 0.769716 0.565326 0.668827 0.290806 0.436652 0.190665
GBM_grid__1_AutoML_20200924_200634_model_4 0.762993 0.56685 0.666984 0.279145 0.437634 0.191524
XGBoost_grid__1_AutoML_20200924_200634_model_9 0.762417 0.570041 0.645664 0.300121 0.440255 0.193824
GBM_grid__1_AutoML_20200924_200634_model_6 0.759912 0.572651 0.636713 0.30097 0.440755 0.194265
StackedEnsemble_BestOfFamily_AutoML_20200924_200634 0.756486 0.574461 0.646087 0.294002 0.441413 0.194845
GBM_grid__1_AutoML_20200924_200634_model_7 0.754153 0.576821 0.641462 0.286041 0.442533 0.195836
XGBoost_1_AutoML_20200924_200634 0.75411 0.584216 0.626074 0.289237 0.443911 0.197057
XGBoost_grid__1_AutoML_20200924_200634_model_3 0.753347 0.57999 0.629876 0.312056 0.4428 0.196072
GBM_grid__1_AutoML_20200924_200634_model_1 0.751706 0.577175 0.628564 0.273603 0.442751 0.196029
XGBoost_grid__1_AutoML_20200924_200634_model_8 0.749446 0.576686 0.610544 0.27844 0.442314 0.195642
[28 rows x 7 columns]
Upvotes: 3