ATUL AGARWAL
ATUL AGARWAL

Reputation: 101

H2O AutoML error Test/Validation dataset has a non-categorical column which is categorical in the training data" on predict

I have trained and saved my H2O AutoML model. after reloading, while I am using predict method, I am getting below error: java.lang.IllegalArgumentException: Test/Validation dataset has a non-categorical column 'response' which is categorical in the training data

I have not specified any encoding while model creation but I am getting this error now. Can anyone help me on this issue.

Any help will be highly appreciated.

Upvotes: 10

Views: 4762

Answers (2)

William Castrillon
William Castrillon

Reputation: 82

Maybe a Little late, but this problem still ocurrs, specially if you have lots of columns, what I dit to solve this problem was:

H2O gives one of two possible messages:

Test/Validation dataset has a non-categorical column '<YOUR-COLUMN>' which is categorical in the training data

or

Test/Validation dataset has categorical column '<YOUR-COLUMN>' which is real-valued in the training data

So, what I did was to extract the column name from the message and convert the column according to the message in categorical or numeric.

so, my python code looks like this:

hf = h2o.H2OFrame(df)
transform = True
while transform:
    try:
        prediction = rf_model.predict(hf)
        transform = False
    except Exception as inst:
        err_msg = str(inst)
        tarr = err_msg.split('categorical')
        column = tarr[1].split("'")[1]
        if tarr[0][-1] == '-': # convert to categorical
            hf[column] = hf[column].asfactor()
            print(f'{column} converted to categorical')
        else: # convert to numeric
            hf[column] = hf[column].asnumeric()
            print(f'{column} converted to real-valued')

Hope it helps!

Upvotes: 0

CezarySzulc
CezarySzulc

Reputation: 2007

This issue related is with new examples data in particular column that doesn't exist in traing set. I use parsing column types to numeric (or string) in this cases.

def _convert_h2oframe_to_numeric(h2o_frame, training_columns):
    for column in training_columns:
        h2o_frame[column] = h2o_frame[column].asnumeric()
    return h2o_frame

Remember to use this function for training and prediction process.

Upvotes: 4

Related Questions