Reputation: 2807
Interested to know if long sentences are good for tensor2tensor model training. And why or why not?
Upvotes: 0
Views: 124
Reputation: 2670
Ideally, the training data should have the same distribution of sentence lengths as the target test data. E.g. in machine translation, if long sentences are intended to be translated by the final model, similarly long sentences should be used also for training. The Transformer model seems to not generalize to longer sentences than were used for training, but limiting the maximum sentence length in training allows to use higher batch sizes, which is helpful (Popel and Bojar, 2018).
Upvotes: 1