Reputation: 45
I am learning about machine translation tasks with transformers. To my knowledge, the transformers model predicts the next word of the target sentence based on the previous words of the source sentence. However, in the MarianMT model (or T5), I find its tokenizer does not have a start of sentence token (<cls> or <s>). I think that a token is needed to start predicting the first word in the target sentence.
Can anyone explain to me how the MarianMT model will predict the first word in the target sentence?
Thank you.
Upvotes: 1
Views: 146
Reputation: 28505
From the documentation:
the model starts generating with pad_token_id (which has
0
as a token_embedding) as the prefix (Bart uses<s/>
)
So it does not need a SOS token as it uses the padding token as a first token during training.
Upvotes: 1