Reputation: 67
When I run:
data_h = h2o.H2OFrame(data)
### Edit: added asfactor() below to change integer target array.
data_h["BPA"] = data_h["BPA"].asfactor()
train, valid = data_h.split_frame(ratios=[.7], seed = 1234)
features = ["bq_packaging_consumepkg", "bq_packaging_microwave_v3", "bq_packaging_plasticbottle_v2",
"bq_packaging_hotdrink_v3", "bq_packaging_microwsaran_v3","bq_food_cannedfoods_v2"]
target = "BPA"
# Hyperparameter tuning
params = {"ntrees": [50, 100, 200, 300, 400, 500, 600],
"max_depth": [10, 30, 50, 70, 90, 110],
"min_rows": [1,5,10,15,20,25]}
criteria = {"strategy": "RandomDiscrete",
"stopping_rounds": 10,
"stopping_tolerance": 0.00001,
"stopping_metric": "misclassification"}
# Grid search and Training
grid_search = H2OGridSearch(model= rf_h, hyper_params= params,
search_criteria = criteria)
grid_search.train(x = features, y = target, training_frame=train,
validation_frame = valid)
# Sorting the grid
sorted_grid = grid_search.get_grid(sort_by='auc', decreasing = True)
Calling grid_search.get_grid(sort_by = 'auc', decreasing = True)
produces the following error:
H2OResponseError: Server error water.exceptions.H2OIllegalArgumentException:
Error: Invalid argument for sort_by specified. Must be one of: [mae, residual_deviance, r2, mean_residual_deviance, rmsle, rmse, mse]
Request: GET /99/Grids/Grid_DRF_py_29_sid_95b5_model_python_1533334963198_8
params: {'sort_by': 'auc', 'decreasing': 'True'}
Looking at the example in the documentation for the grid search I believe that I am using the method correctly.
Edit: Added changing target array to be a factor array from an integer array.
Upvotes: 4
Views: 937
Reputation: 5778
This particular question is asking how to get the AUC for a multiclass classification problem (i.e. the target has more than two factor levels - see the posted image in the comments of the original question). H2O does not calculate the auc for individual categories, and therefore will return an error if you try to use its binary-classification metric auc()
.
To see what metrics are available for multiclass classification problems please see the documentation
Options include, for example: logloss()
and mean_per_class_error()
Upvotes: 2
Reputation: 8819
Error: Invalid argument for sort_by specified. Must be one of: [mae, residual_deviance, r2, mean_residual_deviance, rmsle, rmse, mse]
The problem is that "auc"
is not a valid metric for your problem. It looks like you have trained a regression model instead of a binary classification model, that's why AUC is not allowed. The list of metrics in the error message is the list of allowed metrics for a regression problem.
If your response column is 0's and 1's and you did not convert it to a factor, then it's going to train a regression model instead of a binary classfication model. If this is the case, and you want a binary classification model instead, then all you need to do is first convert the response to a factor:
train[target] = train[target].asfactor()
Upvotes: 2