Getting Error while classifying text data using word2vec

Question

I want to use my own word dataset for creating the embeddings. And use my own label data for training and testing my model. For that I have already created my own word embeddings using word2vec. And facing problem in training my model with label data.

I am getting error while trying to train model. My model creation code:

# create the tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)

max_length = max([len(s.split()) for s in X_train])
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')

tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_test)
encoded_docs = tokenizer.texts_to_sequences(X_test)

X_test = pad_sequences(encoded_docs, maxlen=max_length, padding='post')


# setup the embedding layer
embeddings = Embedding(input_dim=embedding_matrix.shape[0], output_dim=embedding_matrix.shape[1],
                  weights=[embedding_matrix],input_length= max_length, trainable=False)

new_model = Sequential() new_model.add(embeddings)
new_model.add(Conv1D(filters=128, kernel_size=5, activation='relu'))
new_model.add(MaxPooling1D(pool_size=2)) new_model.add(Flatten())
new_model.add(Dense(1, activation='sigmoid'))

And this is how I have created embedding matrix-

embedding_matrix = np.zeros((len(model.wv.vocab), vector_dim))
    for i in range(len(model.wv.vocab)):
        embedding_vector = model.wv[model.wv.index2word[i]]
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector

By doing so I am getting the following error-

 WARNING:tensorflow:From /Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/10
Traceback (most recent call last):
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/embedding-tut/src/main.py", line 359, in 
    custom_keras_model(embedding_matrix, model.wv)
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 295, in custom_keras_model
    new_model.fit(X_train, y_train, epochs=10, verbose=2)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 867, in fit
    initial_epoch=initial_epoch)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1598, in fit
    validation_steps=validation_steps)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]

Caused by op 'embedding_1/GatherV2', defined at:
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 359, in 
    custom_keras_model(embedding_matrix, model.wv)
  File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 278, in custom_keras_model
    new_model.add(embeddings)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 442, in add
    layer(x)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/layers/embeddings.py", line 134, in call
    out = K.gather(self.embeddings, inputs)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1134, in gather
    return tf.gather(reference, indices)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2736, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3065, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): indices[27,2] = 1049 is not in [0, 1045)
     [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]


Process finished with exit code 1

I am getting error in fitting training data into the model. I think I have mistaken in calculting the training data shape and injecting it into the model.

Getting Error while classifying text data using word2vec

Answers (1)

Related Questions