Reputation: 63
I am developing a sequence-to-sequence model (paper) for text generation. I am not using 'teacher-forcing" at the decoder side, i.e. output of decoder at t0 is fed to input of decoder at time t1.
Now, in reality, output of a decoder (LSTM/GRU) is passed through a Dense layer, which in tern generates the index of the word, which is considered as the output of the decoder.
But, for feeding the output to next layer, should we feed the h_t ( i.e. output of decoder/ hidden state of the decoder) to the next step, or the word-embedding of next word is the correct choice ?
Upvotes: 0
Views: 199
Reputation: 2276
The short answer is: probably both, but the hidden state h_t is essential.
Feeding the hidden state h_t is required to pass information about the entire sentence (not just the previous word) from one decoder layer to the next.
Feeding the embedding of the chosen word is not essential, but it is probably a good idea. This allows the decoder to condition on the previous choices it was forced to make.
Upvotes: 1