Reputation: 1066
I have a timeseries dataset name temp which has 4 columns; Date, Minutes, Issues, Reason no.
in which:
temp['REASON NO'].value_counts()
shows this output:
R13 158
R14 123
R4 101
R7 81
R2 40
R3 35
R5 31
R8 11
R15 9
R12 3
R6 2
R10 2
R9 1
I had run this code earlier which ran fine:
reason_no = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1))
But at the end after building model. I wanted to forecast values of Minutes, Issues, Reason no. for next week.
I tried this code:
seq_length=7
last_week = df.iloc[-seq_length:, :]
last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))
last_issue = enc.transform(last_week['Issue'].values.reshape(-1, 1))
last_minutes = scaler.transform(last_week['Minutes'].values.reshape(-1, 1))
last_X = np.hstack([last_reason_no, last_issue, last_minutes])
next_X = last_X.reshape(1, last_X.shape[0], last_X.shape[1])
for i in range(7):
pred = model.predict(next_X)
pred_minutes = scaler.inverse_transform(pred[:, 2].reshape(-1, 1))[0][0]
pred_issue = enc.inverse_transform([np.argmax(pred[:, 1])])[0]
pred_reason_no = enc.inverse_transform([np.argmax(pred[:, 0])])[0]
print(f'Date: {last_week.iloc[-1, 0]}')
print(f'Predicted Reason Number: {pred_reason_no}')
print(f'Predicted Issue: {pred_issue}')
print(f'Predicted Minutes: {pred_minutes}')
But when I run this code, I got an error:
ValueError
Traceback (most recent call last)in <cell line: 1>() ----> 1 last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))
2 frames
/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown) 172 " during transform".format(diff, i) 173 ) --> 174 raise ValueError(msg) 175 else: 176 if warn_on_unknown:
ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform.
Kindly looking for help to learn why I'm getting this error and how to fix it.
Upvotes: 1
Views: 99
Reputation: 120439
You can't encode categories never seen during transform process:
from sklearn.preprocessing import OneHotEncoder
# Something like X_train, X_test = test_train_split(X, ...)
X_train = pd.DataFrame({'REASON NO': ['R13', 'R14', 'R7']})
X_test = pd.DataFrame({'REASON NO': ['R4', 'R7', 'R5']})
enc = OneHotEncoder()
Output:
>>> enc.fit_transform(X_train).toarray()
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> enc.transform(X_test)
...
ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform
Upvotes: 1