Reputation: 508
I am learning h2o model predictions. When I do:
data_frame = h2o.H2OFrame(python_obj=data[1:], column_names=data[0])
data_train, data_valid, data_test = data_frame.split_frame(ratios= config.trainer_analizer_ratios, seed=config.trainer_analizer_seed)
# H2OGeneralizedLinearEstimator
allLog += "/n Starting H2OGeneralizedLinearEstimator"
model_gle = h2o.estimators.H2OGeneralizedLinearEstimator()
model_gle.train(x=predictors, y=response, training_frame= data_train, validation_frame= data_valid)
print(model_gle)
perf_gle = model_gle.model_performance(test_data= data_test)
print("GBM Precision:",perf_gle)
I get the following output
** Reported on test data. **
MSE: 494.25950875189955
RMSE: 22.23194792976764
MAE: 17.380709221249717
RMSLE: 1.217426465475652
R^2: 0.04331665117221439
Mean Residual Deviance: 494.25950875189955
Null degrees of freedom: 1177
Residual degrees of freedom: 1174
Null deviance: 608812.1064795277
Residual deviance: 582237.7013097376
AIC: 10660.224689554776
Why don't I get the ACU
metric? I need that to score different models.
Upvotes: 0
Views: 93
Reputation: 5778
The GLM algo thinks you are solving a regression problem. You need to specify that you are solving a classification problem. You can do this with the family parameter (please see the documentation for an example) and possibly you need to set your target to type enum
using the asfactor()
method.
For your convenience here is the example code snippet that the link points to:
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()
# import the cars dataset:
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# convert response column to a factor
cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
# set the predictor names and the response column name
predictors = ["displacement","power","weight","acceleration","year"]
response = "economy_20mpg"
# split into train and validation sets
train, valid = cars.split_frame(ratios = [.8])
# try using the `family` parameter:
# Initialize and train a GLM
cars_glm = H2OGeneralizedLinearEstimator(family = 'binomial')
cars_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print the auc for the validation data
cars_glm.auc(valid = True)
Upvotes: 1