Reputation: 162
I want to use my own word dataset for creating the embeddings. And use my own label data for training and testing my model. For that I have already created my own word embeddings using word2vec. And facing problem in training my model with label data.
I am getting error while trying to train model. My model creation code:
# create the tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)
max_length = max([len(s.split()) for s in X_train])
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_test)
encoded_docs = tokenizer.texts_to_sequences(X_test)
X_test = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
# setup the embedding layer
embeddings = Embedding(input_dim=embedding_matrix.shape[0], output_dim=embedding_matrix.shape[1],
weights=[embedding_matrix],input_length= max_length, trainable=False)
new_model = Sequential() new_model.add(embeddings)
new_model.add(Conv1D(filters=128, kernel_size=5, activation='relu'))
new_model.add(MaxPooling1D(pool_size=2)) new_model.add(Flatten())
new_model.add(Dense(1, activation='sigmoid'))
And this is how I have created embedding matrix-
embedding_matrix = np.zeros((len(model.wv.vocab), vector_dim))
for i in range(len(model.wv.vocab)):
embedding_vector = model.wv[model.wv.index2word[i]]
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
By doing so I am getting the following error-
WARNING:tensorflow:From /Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/10
Traceback (most recent call last):
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/embedding-tut/src/main.py", line 359, in <module>
custom_keras_model(embedding_matrix, model.wv)
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 295, in custom_keras_model
new_model.fit(X_train, y_train, epochs=10, verbose=2)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 867, in fit
initial_epoch=initial_epoch)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1598, in fit
validation_steps=validation_steps)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1183, in _fit_loop
outs = f(ins_batch)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
**self.session_kwargs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
Caused by op 'embedding_1/GatherV2', defined at:
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 359, in <module>
custom_keras_model(embedding_matrix, model.wv)
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 278, in custom_keras_model
new_model.add(embeddings)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 442, in add
layer(x)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/layers/embeddings.py", line 134, in call
out = K.gather(self.embeddings, inputs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1134, in gather
return tf.gather(reference, indices)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2736, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3065, in gather_v2
"GatherV2", params=params, indices=indices, axis=axis, name=name)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
Process finished with exit code 1
I am getting error in fitting training data into the model. I think I have mistaken in calculting the training data shape and injecting it into the model.
Upvotes: 0
Views: 203
Reputation: 2050
You are using two different Tokenizers and you train them separately on train and test. What happens is, that your tokens do not match for training and test. Your error is caused, because a token occurs (1049) which is not is not in max_length. Even if you fix that, your model will not work, if you have two tokenizers.
What you should do it to fit your Tokenizer on all data (X_train and X_test) and use just one single Tokenizer.
Upvotes: 1