Reputation: 53
The error that i am getting is this. The subset[~100k examples] of my data has exactly the same number of columns as the original dataset [400k examples].But it runs perfectly on the original dataset but not on the subset.
Traceback (most recent call last)
<ipython-input-14-35cf02055a2e> in <module>()
15 from h2o.estimators.gbm import H2OGradientBoostingEstimator
16 gbm_cv3 = H2OGradientBoostingEstimator(nfolds=2)
---> 17 gbm_cv3.train(x=x, y=y, training_frame=train)
18 ## Getting all cross validated models
19 all_models = gbm_cv3.cross_validation_models()
error_count = 2
http_status = 412
msg = u'Illegal argument(s) for GBM model:
GBM_model_python_1533214798867_179. Details: ERRR on field:
_response: Response cannot be constant.'
dev_msg = u'Illegal argument(s) for GBM model:
GBM_model_python_1533214798867_179. Details: ERRR on field:
_response: Response cannot be constant.'
Upvotes: 3
Views: 1751
Reputation: 3671
This is a user error.
The "response" is the y column. And for the subset of data you have given, every row has the same value for y. You cannot train a supervised machine learning model when every y value is the same — there is nothing for the model to learn.
This can happen if you have a rare outcome -- when you randomly split the data you might get a partition that only has one value represented. To check how many unique values you have in the response column in Python, do the following: train[y].unique()
Upvotes: 5