Reputation: 1
I'm trying to train a simple classification model using Keras that works with text and has two output classification: 0 or 1. I transformed my text data using TfidVectorizer() from SKlearn and everything ran well, until I tried to predict using the Test dataset (here called X_test_tfid). I thought it could be because this is a sparse matrix and Keras could not handle it very well, so I even transformed to a TensorFlowSparse using a very simple transformation algorithm, but the same error occurs. I have no idea what could be possible happening. I appreciate any advices. Notice that if I try to "predict" using the training dataset everything runs just fine. I will copy the main parts of the code here since it's a simple code:
import pandas as pd
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import numpy as np
from keras.utils import to_categorical
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
vectorizer = TfidfVectorizer()
X_train_tfid = vectorizer.fit_transform(X_train)
X_test_tfid = vectorizer.transform(X_test)
#Simple transforming the Y to categorical values so it can be read by the Neural Network
y_train_categorical = y_train.copy()
y_train_categorical.replace({'F': 0, 'M': 1}, inplace= True)
y_test_categorical = y_test.copy()
y_test_categorical.replace({'F': 0, 'M': 1}, inplace= True)
y_train_categorical = to_categorical(y_train_categorical)
y_test_categorical = to_categorical(y_test_categorical)
#Creating the simple model - note that I harded code the dimension here, but it is the correct number in my example
model = Sequential()
model.add(Dense(8, input_dim=1388, activation='relu'))
model.add(Dense(8, input_dim=20, activation='relu'))
model.add(Dense(2, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
#Fitting the model
model.fit(X_train_tfid, y_train_categorical, epochs=10, verbose=1)
#Results and ran well
Epoch 1/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.1075
Epoch 2/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0681
Epoch 3/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0446
Epoch 4/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0316
Epoch 5/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0228
Epoch 6/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0179
Epoch 7/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0141
Epoch 8/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0118
Epoch 9/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0095
Epoch 10/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0079
<keras.src.callbacks.history.History at 0x77e705822d10>
#Now trying to predict the model when the error occurs:
keras_predict = model.predict(X_test_tfidf)
#ERROR BELOW
2024-09-11 11:26:55.182607: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[31] = 607 is not in [0, 607)
2024-09-11 11:26:55.182687: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[31] = 607 is not in [0, 607)
2024-09-11 11:26:55.185070: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[0] = 608 is not in [0, 607)
2024-09-11 11:26:55.185277: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[0] = 608 is not in [0, 607)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
Cell In[103], line 1
----> 1 keras_predict = model.predict(X_test_tfid)
File ~/.conda/envs/python3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
119 filtered_tb = _process_traceback_frames(e.__traceback__)
120 # To get the full stack trace, call:
121 # `keras.config.disable_traceback_filtering()`
--> 122 raise e.with_traceback(filtered_tb) from None
123 finally:
124 del filtered_tb
File ~/.conda/envs/python3.11/lib/python3.11/site-packages/tensorflow/python/framework/ops.py:5983, in raise_from_not_ok_status(e, name)
5981 def raise_from_not_ok_status(e, name) -> NoReturn:
5982 e.message += (" name: " + str(name if name is not None else ""))
-> 5983 raise core._status_to_exception(e) from None
InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} Error in user-defined function passed to ParallelMapDatasetV2:697 transformation with iterator: Iterator::Root::Prefetch::ParallelMapV2: indices[31] = 607 is not in [0, 607)
[[{{node RaggedGather_1/RaggedGather}}]] [Op:IteratorGetNext] name:
As I mentioned I also tried to transform the X_test_tfid and X_train_tfid to a Sparse Tensor with a very simple algorithm. The same problem occured: it worked just fine on the fit method, but got the same error on predict method (even not the best way, but works!)
#Transforming the Sparse Matrix created by the Vectorization from Sklearn to a TensorFlow Sparse Matrix
def transform_to_sparse_tensor(sparse_matrix):
indexes = []
values = []
for l in range(sparse_matrix.shape[0]):
for c in range(sparse_matrix.shape[1]):
if sparse_matrix[l, c] != 0:
indexes.append([l, c])
values.append(sparse_matrix[l,c])
st = SparseTensor(indices=indexes, values=values, dense_shape=[sparse_matrix.shape[0],sparse_matrix.shape[1]])
return st
X_train_tensor = transform_to_sparse_tensor(X_train_tfidf)
X_test_tensor = transform_to_sparse_tensor(X_test_tfidf)
Then I fit using X_train_tensor (worked!) and when I predict with X_test_tensor I got the exactly same error.
I imagine it should be very simple since it has no fancy code or steps.. Did anyone get the same error and could help here?
Upvotes: 0
Views: 122