Lemon
Lemon

Reputation: 1394

How to use Keras LSTM with word embeddings to predict word id's

I have problems understanding how to get the correct output when using word embeddings in Keras. My settings are as follows:

My question:

The output batch will contain probabilities because of the softmax activation function. But what I want is the network to predict integers such that the output fits the target batch of integers. How can I "decode" the output such that I know which word the network is predicting? Or do I have to construct the network differently?

Edit 1:

I have changed the output and target batches from 2D arrays to 3D tensors. So instead of using a target batch of size (batch_size, sequence_length) with integer id's I am now using a one-hot encoded 3D target tensor (batch_size, sequence_length, vocab_size). To get the same format as an output of the network, I have changed the network to output sequences (by setting return_sequences=True in the LSTM layer). Further, the number of output neurons was changed to vocab_size such that the output layer now produces a batch of size (batch_size, sequence_length, vocab_size). With this 3D encoding I can get the predicted word id using tf.argmax(outputs, 2). This approach seems to work for the moment but I would still be interested whether it's possible to keep the 2D targets/outputs

Upvotes: 1

Views: 1029

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86650

One, solution, perhaps not the best, is to output one-hot vectors the size of of your dictionary (including dummy words).

Your last layer must output (sequence_length, dictionary_size+1).

Your dense layer will already output the sequence_length if you don't add any Flatten() or Reshape() before it, so it should be a Dense(dictionary_size+1)

You can use the functions keras.utils.to_categorical() to transform an integer in a one-hot vector and keras.backend.argmax() to transform a one=hot vector into an integer.

Unfortunately, this is sort of unpacking your embedding. It would be nice if it were possible to have a reverse embedding or something like that.

Upvotes: 1

Related Questions