h2o error when run on a subset of the data but runs perfectly on the original data

Question

The error that i am getting is this. The subset[~100k examples] of my data has exactly the same number of columns as the original dataset [400k examples].But it runs perfectly on the original dataset but not on the subset.

Traceback (most recent call last)
 in ()
     15 from h2o.estimators.gbm import H2OGradientBoostingEstimator
     16 gbm_cv3 = H2OGradientBoostingEstimator(nfolds=2)
---> 17 gbm_cv3.train(x=x, y=y, training_frame=train)
     18 ## Getting all cross validated models
     19 all_models = gbm_cv3.cross_validation_models()



error_count = 2
    http_status = 412
    msg = u'Illegal argument(s) for GBM model: 
GBM_model_python_1533214798867_179.  Details: ERRR on field: 
_response: Response cannot be constant.'
    dev_msg = u'Illegal argument(s) for GBM model: 
GBM_model_python_1533214798867_179.  Details: ERRR on field: 
_response: Response cannot be constant.'

TomKraljevic · Accepted Answer

This is a user error.

The "response" is the y column. And for the subset of data you have given, every row has the same value for y. You cannot train a supervised machine learning model when every y value is the same — there is nothing for the model to learn.

This can happen if you have a rare outcome -- when you randomly split the data you might get a partition that only has one value represented. To check how many unique values you have in the response column in Python, do the following: train[y].unique()

h2o error when run on a subset of the data but runs perfectly on the original data

Answers (1)

Related Questions