Error ParallelMapDatasetV2 when trying to predict a model using Keras

Question

I'm trying to train a simple classification model using Keras that works with text and has two output classification: 0 or 1. I transformed my text data using TfidVectorizer() from SKlearn and everything ran well, until I tried to predict using the Test dataset (here called X_test_tfid). I thought it could be because this is a sparse matrix and Keras could not handle it very well, so I even transformed to a TensorFlowSparse using a very simple transformation algorithm, but the same error occurs. I have no idea what could be possible happening. I appreciate any advices. Notice that if I try to "predict" using the training dataset everything runs just fine. I will copy the main parts of the code here since it's a simple code:

import pandas as pd
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import classification_report
from sklearn.svm import SVC

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

import numpy as np
from keras.utils import to_categorical


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
vectorizer = TfidfVectorizer()

X_train_tfid = vectorizer.fit_transform(X_train)
X_test_tfid = vectorizer.transform(X_test)

#Simple transforming the Y to categorical values so it can be read by the Neural Network
y_train_categorical = y_train.copy()
y_train_categorical.replace({'F': 0, 'M': 1}, inplace= True)
y_test_categorical = y_test.copy()
y_test_categorical.replace({'F': 0, 'M': 1}, inplace= True)
y_train_categorical = to_categorical(y_train_categorical)
y_test_categorical = to_categorical(y_test_categorical)


#Creating the simple model - note that I harded code the dimension here, but it is the correct number in my example

model = Sequential()
model.add(Dense(8, input_dim=1388, activation='relu'))
model.add(Dense(8, input_dim=20, activation='relu'))
model.add(Dense(2, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#Fitting the model
model.fit(X_train_tfid, y_train_categorical, epochs=10, verbose=1)

#Results and ran well

Epoch 1/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.1075
Epoch 2/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0681
Epoch 3/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0446
Epoch 4/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0316
Epoch 5/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0228
Epoch 6/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0179
Epoch 7/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 1.0000 - loss: 0.0141
Epoch 8/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0118
Epoch 9/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0095
Epoch 10/10
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 1.0000 - loss: 0.0079


#Now trying to predict the model when the error occurs:

keras_predict = model.predict(X_test_tfidf)

#ERROR BELOW

2024-09-11 11:26:55.182607: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[31] = 607 is not in [0, 607)
2024-09-11 11:26:55.182687: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[31] = 607 is not in [0, 607)
2024-09-11 11:26:55.185070: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[0] = 608 is not in [0, 607)
2024-09-11 11:26:55.185277: W tensorflow/core/framework/op_kernel.cc:1840] OP_REQUIRES failed at ragged_gather_op.cc:77 : INVALID_ARGUMENT: indices[0] = 608 is not in [0, 607)
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[103], line 1
----> 1 keras_predict = model.predict(X_test_tfid)

File ~/.conda/envs/python3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback..error_handler(*args, **kwargs)
    119     filtered_tb = _process_traceback_frames(e.__traceback__)
    120     # To get the full stack trace, call:
    121     # `keras.config.disable_traceback_filtering()`
--> 122     raise e.with_traceback(filtered_tb) from None
    123 finally:
    124     del filtered_tb

File ~/.conda/envs/python3.11/lib/python3.11/site-packages/tensorflow/python/framework/ops.py:5983, in raise_from_not_ok_status(e, name)
   5981 def raise_from_not_ok_status(e, name) -> NoReturn:
   5982   e.message += (" name: " + str(name if name is not None else ""))
-> 5983   raise core._status_to_exception(e) from None

InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} Error in user-defined function passed to ParallelMapDatasetV2:697 transformation with iterator: Iterator::Root::Prefetch::ParallelMapV2: indices[31] = 607 is not in [0, 607)
     [[{{node RaggedGather_1/RaggedGather}}]] [Op:IteratorGetNext] name:

As I mentioned I also tried to transform the X_test_tfid and X_train_tfid to a Sparse Tensor with a very simple algorithm. The same problem occured: it worked just fine on the fit method, but got the same error on predict method (even not the best way, but works!)

#Transforming the Sparse Matrix created by the Vectorization from Sklearn to a TensorFlow Sparse Matrix

def transform_to_sparse_tensor(sparse_matrix):
    indexes = []
    values = []
    for l in range(sparse_matrix.shape[0]):
        for c in range(sparse_matrix.shape[1]):
            if sparse_matrix[l, c] != 0:
                indexes.append([l, c])
                values.append(sparse_matrix[l,c])
    st = SparseTensor(indices=indexes, values=values, dense_shape=[sparse_matrix.shape[0],sparse_matrix.shape[1]])
    return st

X_train_tensor = transform_to_sparse_tensor(X_train_tfidf)
X_test_tensor = transform_to_sparse_tensor(X_test_tfidf)

Then I fit using X_train_tensor (worked!) and when I predict with X_test_tensor I got the exactly same error.

I imagine it should be very simple since it has no fancy code or steps.. Did anyone get the same error and could help here?

Error ParallelMapDatasetV2 when trying to predict a model using Keras

Answers (0)

Related Questions