How do I create a batch generator of different length sequences in TensorFlow Keras?

Question

There seems to be a lot of articles on creating data generators for computer vision tasks, but for some reason not so much for NLP. I wan to feed text corpus of varying lengths into a standard RNN/LSTM/Transformer network. The size of each example can be as small as a few words to as long as a paragraph. Because of the discrepancy in text length, it doesn't seem like a good idea to pad all the examples the same amount; i.e. a 5-word sentence shouldn't be padded with 200+ zeros. At least, that is my motivation for wanting to use a data generator. Is this possible in TensorFlow/Keras? And if so, how would I go about implementing it?

How do I create a batch generator of different length sequences in TensorFlow Keras?

Answers (1)

Related Questions