How to use Keras LSTM with word embeddings to predict word id's

Question

I have problems understanding how to get the correct output when using word embeddings in Keras. My settings are as follows:

My input are batches of shape (batch_size, sequence_length). Each row in a batch represents one sentence, the word are represented by word id's. The sentences are padded with zeros such that all are of the same length. For example a (3,6) input batch might look like: np.array([[135600],[174580],[138272]])
My targets are given by the input batch shifted one step to the right. So for each input word I want to predict the next word: np.array([[356000],[745800],[382720]])
I feed such an input batch into the Keras embedding layer. My embedding size is 100, so the output will be a 3D tensor of shape (batch_size, sequence_length, embedding_size). So in the little example its (3,6,100)
This 3D batch is fed into an LSTM layer
The output of the LSTM layer is fed into a Dense layer with (sequence_length) output neurons having a softmax activation function. So the shape of the output will be like the shape of the input namely (batch_size, sequence_length)
As a loss I am using the categorical crossentropy between the input and target batch

My question:

The output batch will contain probabilities because of the softmax activation function. But what I want is the network to predict integers such that the output fits the target batch of integers. How can I "decode" the output such that I know which word the network is predicting? Or do I have to construct the network differently?

Edit 1:

I have changed the output and target batches from 2D arrays to 3D tensors. So instead of using a target batch of size (batch_size, sequence_length) with integer id's I am now using a one-hot encoded 3D target tensor (batch_size, sequence_length, vocab_size). To get the same format as an output of the network, I have changed the network to output sequences (by setting return_sequences=True in the LSTM layer). Further, the number of output neurons was changed to vocab_size such that the output layer now produces a batch of size (batch_size, sequence_length, vocab_size). With this 3D encoding I can get the predicted word id using tf.argmax(outputs, 2). This approach seems to work for the moment but I would still be interested whether it's possible to keep the 2D targets/outputs

How to use Keras LSTM with word embeddings to predict word id's

Answers (1)

Related Questions

How to use Keras LSTM with word embeddings to predict word id&#39;s

Answers (1)

Related Questions

How to use Keras LSTM with word embeddings to predict word id's