def __init__
def __init__

Reputation: 1066

Getting value error while enc.transform, where enc is OneHotEncoder(sparse_output=False), in pandas

I have a timeseries dataset name temp which has 4 columns; Date, Minutes, Issues, Reason no.

in which:

temp['REASON NO'].value_counts()

shows this output:

R13    158
R14    123
R4     101
R7      81
R2      40
R3      35
R5      31
R8      11
R15      9
R12      3
R6       2
R10      2
R9       1

I had run this code earlier which ran fine:

reason_no = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1))

But at the end after building model. I wanted to forecast values of Minutes, Issues, Reason no. for next week.

I tried this code:

seq_length=7
last_week = df.iloc[-seq_length:, :]
last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))
last_issue = enc.transform(last_week['Issue'].values.reshape(-1, 1))
last_minutes = scaler.transform(last_week['Minutes'].values.reshape(-1, 1))
last_X = np.hstack([last_reason_no, last_issue, last_minutes])
next_X = last_X.reshape(1, last_X.shape[0], last_X.shape[1])
for i in range(7):
    pred = model.predict(next_X)
    pred_minutes = scaler.inverse_transform(pred[:, 2].reshape(-1, 1))[0][0]
    pred_issue = enc.inverse_transform([np.argmax(pred[:, 1])])[0]
    pred_reason_no = enc.inverse_transform([np.argmax(pred[:, 0])])[0]
    print(f'Date: {last_week.iloc[-1, 0]}')
    print(f'Predicted Reason Number: {pred_reason_no}')
    print(f'Predicted Issue: {pred_issue}')
    print(f'Predicted Minutes: {pred_minutes}')

But when I run this code, I got an error:

ValueError
Traceback (most recent call last)

in <cell line: 1>() ----> 1 last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))

2 frames

/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown) 172 " during transform".format(diff, i) 173 ) --> 174 raise ValueError(msg) 175 else: 176 if warn_on_unknown:

ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform.

Kindly looking for help to learn why I'm getting this error and how to fix it.

Upvotes: 1

Views: 99

Answers (1)

Corralien
Corralien

Reputation: 120439

You can't encode categories never seen during transform process:

from sklearn.preprocessing import OneHotEncoder

# Something like X_train, X_test = test_train_split(X, ...)
X_train = pd.DataFrame({'REASON NO': ['R13', 'R14', 'R7']})
X_test = pd.DataFrame({'REASON NO': ['R4', 'R7', 'R5']})

enc = OneHotEncoder()

Output:

>>> enc.fit_transform(X_train).toarray()
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

>>> enc.transform(X_test)
...
ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform

Upvotes: 1

Related Questions