Reputation: 101
I have trained and saved my H2O AutoML model. after reloading, while I am using predict method, I am getting below error: java.lang.IllegalArgumentException: Test/Validation dataset has a non-categorical column 'response' which is categorical in the training data
I have not specified any encoding while model creation but I am getting this error now. Can anyone help me on this issue.
Any help will be highly appreciated.
Upvotes: 10
Views: 4762
Reputation: 82
Maybe a Little late, but this problem still ocurrs, specially if you have lots of columns, what I dit to solve this problem was:
H2O gives one of two possible messages:
Test/Validation dataset has a non-categorical column '<YOUR-COLUMN>' which is categorical in the training data
or
Test/Validation dataset has categorical column '<YOUR-COLUMN>' which is real-valued in the training data
So, what I did was to extract the column name from the message and convert the column according to the message in categorical or numeric.
so, my python code looks like this:
hf = h2o.H2OFrame(df)
transform = True
while transform:
try:
prediction = rf_model.predict(hf)
transform = False
except Exception as inst:
err_msg = str(inst)
tarr = err_msg.split('categorical')
column = tarr[1].split("'")[1]
if tarr[0][-1] == '-': # convert to categorical
hf[column] = hf[column].asfactor()
print(f'{column} converted to categorical')
else: # convert to numeric
hf[column] = hf[column].asnumeric()
print(f'{column} converted to real-valued')
Hope it helps!
Upvotes: 0
Reputation: 2007
This issue related is with new examples data in particular column that doesn't exist in traing set. I use parsing column types to numeric (or string) in this cases.
def _convert_h2oframe_to_numeric(h2o_frame, training_columns):
for column in training_columns:
h2o_frame[column] = h2o_frame[column].asnumeric()
return h2o_frame
Remember to use this function for training and prediction process.
Upvotes: 4