HauLuk
HauLuk

Reputation: 155

Keras - Predict on model after learning sentiment data throws errors

My problem is in getting results through predict method in keras using tensorflow backend. But first a small introduction.

I am using

I created a convolutional neural network like in these documentation: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

I trained the network with 11842 prepared twitter texts. The only individual change is that I have 3 possibilities for a result (0,1,2). I defined that in following code line:

preds = Dense(3, activation='softmax')(x)

So the method fit works without a problem and I am achieving between 88-92% acc.

model_fit = model.fit(x_train, y_train, validation_data=(x_val, y_val), nb_epoch=10, batch_size=128)

After the learning process I saved the model in .h5 format (also works fine).

Now I try to load the models and predict with them. First example (trained_model) is done via the same data I used to train ... because I wanted to compare them. The second example (trained_model_2) is done via new twitter texts (I collected earlier).

trained_model = load_model("trained_model.h5")
prediction_result = trained_model.predict(data_train, batch_size=128)
print prediction_result.shape ### Prints: (11842, 3)

trained_model_2 = load_model("trained_model.h5")
prediction_result_2 = trained_model_2.predict(data_predict, batch_size=128)

For comparing the training dataset with the "live / new" data set:

print data_train.shape # (11842, 1000)

print data_predict.shape # (46962, 1000)

Also both are from type dtype=int32

Following code line raises the first error:

prediction_result_2 = trained_model_2.predict(data_predict, batch_size=128)

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,999] = 13608 is not in [0, 13480) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1_W_1/read, _recv_input_1_1_0)]]

Following code line raises the second error:

trained_model_2 = load_model("trained_model.h5")

InvalidArgumentError (see above for traceback): indices[0,999] = 13608 is not in [0, 13480) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1_W_1/read, _recv_input_1_1_0)]]

EDIT Source Code of the methods I created. Method "trainModule" is only used for training the network / save it. The method "predict_sentiment" is used for my predict tests. The first prediction_results works and returns a numpy array with following shape (11842, 3) Code - pastbin

The whole error output: Error output - pastbin

If some additional information is needed, I will update the question...

Upvotes: 3

Views: 1133

Answers (2)

HauLuk
HauLuk

Reputation: 155

The problem was the trained model can't find the word in the embedding matrix. That means I used a different vocabulary for training and prediction. Because the of the fixed vocabulary I need the same vocabulary for train and new data.

In general I only had to fix the tokenizer from:

tokenizer_predict = Tokenizer(nb_words=MAX_NB_WORDS)
tokenizer_predict.fit_on_texts(texts_predict)
sequence_predict = tokenizer_predict.texts_to_sequences(predict_data)

To:

tokenizer_predict = Tokenizer(nb_words=MAX_NB_WORDS)
tokenizer_predict.fit_on_texts(texts_train)
sequence_predict = tokenizer_predict.texts_to_sequences(predict_data)

Upvotes: 2

Cahya
Cahya

Reputation: 11

Maybe it tried to access a list of [0, 13480) with the index 13608 (which obviously will not work). Someone else had also a similar issue: https://github.com/tensorflow/tensorflow/issues/2734 It seems he tried to access a vocabulary [0, 10000) with the index of 10535.

Upvotes: 0

Related Questions