Reputation: 1394
I have problems understanding how to get the correct output when using word embeddings in Keras. My settings are as follows:
My input are batches of shape (batch_size, sequence_length)
. Each row
in a batch represents one sentence, the word are represented by word id's. The
sentences are padded with zeros such that all are of the same length.
For example a (3,6)
input batch might look like: np.array([[135600],[174580],[138272]])
My targets are given by the input batch shifted one step to the right.
So for each input word I want to predict the next word: np.array([[356000],[745800],[382720]])
I feed such an input batch into the Keras embedding layer. My embedding
size is 100, so the output will be a 3D tensor of shape (batch_size,
sequence_length, embedding_size)
. So in the little example its (3,6,100)
This 3D batch is fed into an LSTM layer
The output of the LSTM layer is fed into a Dense layer with
(sequence_length)
output neurons having a softmax activation
function. So the shape of the output will be like the shape of the input namely (batch_size, sequence_length)
As a loss I am using the categorical crossentropy between the input and target batch
My question:
The output batch will contain probabilities because of the softmax activation function. But what I want is the network to predict integers such that the output fits the target batch of integers. How can I "decode" the output such that I know which word the network is predicting? Or do I have to construct the network differently?
Edit 1:
I have changed the output and target batches from 2D arrays to 3D tensors. So instead of using a target batch of size (batch_size, sequence_length)
with integer id's I am now using a one-hot encoded 3D target tensor (batch_size, sequence_length, vocab_size)
. To get the same format as an output of the network, I have changed the network to output sequences (by setting return_sequences=True
in the LSTM layer). Further, the number of output neurons was changed to vocab_size
such that the output layer now produces a batch of size (batch_size, sequence_length, vocab_size)
.
With this 3D encoding I can get the predicted word id using tf.argmax(outputs, 2)
. This approach seems to work for the moment but I would still be interested whether it's possible to keep the 2D targets/outputs
Upvotes: 1
Views: 1029
Reputation: 86650
One, solution, perhaps not the best, is to output one-hot vectors the size of of your dictionary (including dummy words).
Your last layer must output (sequence_length, dictionary_size+1)
.
Your dense layer will already output the sequence_length
if you don't add any Flatten()
or Reshape()
before it, so it should be a Dense(dictionary_size+1)
You can use the functions keras.utils.to_categorical()
to transform an integer in a one-hot vector and keras.backend.argmax()
to transform a one=hot vector into an integer.
Unfortunately, this is sort of unpacking your embedding. It would be nice if it were possible to have a reverse embedding or something like that.
Upvotes: 1