Reputation: 235
I have been processing this thought in my head for a long time now. So in NMT, We pass in the text in the source language in the encoder seq2seq stage and the language in the target language in the decoder seq2seq stage and the system learns the conditional probabilities for each word occurring with its target language word. Ex: P(word x|previous n-words). We train this by teacher forcing.
But what if I pass in the input sentence again as input to the decoder stage instead of the target sentence. What would it learn in this case? I'm guessing this will learn to predict the most probable next word in the sentence given the previous text right? What are your thoughts
Thanks in advance
Upvotes: 0
Views: 42
Reputation: 11250
In that case, you would be learning a model that copies the input symbol to the output. It is trivial for the attention mechanism to learn the identity correspondence between the encoder and decoder states. Moreover, RNNs can easily implement a counter. It thus won't provide any realistic estimate of the probability, it will assign most of the probability mass to the corresponding word in the source sentence.
Upvotes: 1