Reputation: 67
Final Edit: this problem ended up occurring because the target array were integers that were supposed to represent categories so it was doing a regression. Once I converted them into factors using .asfactor()
, then the confusion matrix method detailed in the answer below worked
I am trying to run a confusion matrix on my Random Forest Model (my_model
), but the documentation has been less than helpful. From here it says the command is h2o.confusionMatrix(my_model)
but there is no such thing in 3.0.
Here are the steps to fit the model:
from h2o.estimators.random_forest import H2ORandomForestEstimator
data_h = h2o.H2OFrame(data)
train, valid = data_h.split_frame(ratios=[.7], seed = 1234)
my_model = H2ORandomForestEstimator(model_id = "rf_h", ntrees = 400,
max_depth = 30, nfolds = 8, seed = 25)
my_model.train(x = features, y = target, training_frame=train)
pred = rf_h.predict(valid)
I have tried the following:
my_model.confusion_matrix()
AttributeError: type object 'H2ORandomForestEstimator' has no attribute
'confusion_matrix'
Gotten from this example.
I have attempted to use tab completion to find out what it might be and have tried:
h2o.model.confusion_matrix(my_model)
TypeError: 'module' object is not callable
and
h2o.model.ConfusionMatrix(my_model)
which outputs simply all the model diagnostics and then the error:
H2OTypeError: Argument `cm` should be a list, got H2ORandomForestEstimator
Finally,
h2o.model.ConfusionMatrix(pred)
Which gives the same error as above.
Not sure what to do here, how can I view the results of the confusion matrix of the model?
Edit: Added more code to the beginning of the question for Context
Upvotes: 1
Views: 7187
Reputation: 5778
please see the documentation for the full parameter list. For your convenience here is the list confusion_matrix(metrics=None, thresholds=None, train=False, valid=False, xval=False)
.
Here is a working example of how to use the method:
import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
h2o.init()
# import the cars dataset:
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# convert response column to a factor
cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
# set the predictor names and the response column name
predictors = ["displacement","power","weight","acceleration","year"]
response = "economy_20mpg"
# split into train and validation sets
train, valid = cars.split_frame(ratios = [.8], seed = 1234)
# try using the binomial_double_trees (boolean parameter):
# Initialize and train a DRF
cars_drf = H2ORandomForestEstimator(binomial_double_trees = False, seed = 1234)
cars_drf.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
cars_drf.confusion_matrix()
# or specify the validation frame
cars_drf.confusion_matrix(valid=True)
Upvotes: 4