tensorflow shuffle and batch necessary if building the model sequentially?

Question

I am looking at the recurrent neural net walkthrough here. In the tutorial they have a line item that does:

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

However, if you're doing a sequential build, is that still necessary? Looking at the sequential documentation, a shuffle is performed automatically? If not, why is it done here? Is there a simple numerical example of the effect?

Shubham Panchal · Accepted Answer

The tf.keras.models.Sequential can also batch and shuffle the data, similar to what tf.data.Dataset does. These preprocessing features are provided in Sequential because it can take up data in several types like NumPy arrays, tf.data.Dataset, dict object as well as a tf.keras.utils.Sequence.

The tf.data.Dataset API provides these features because the API is consistent with other TensorFlow APIs ( in which Keras is not involved ).

I don't think the shuffling and batching need to be done twice. You may remove the if you wish, it will not affect the model's training. I think the author wanted to use tf.data.Dataset to get the data into a Keras model. dataset.shuffle( ... ).batch( ... ) have been colloquial with the Dataset.

tensorflow shuffle and batch necessary if building the model sequentially?

Answers (1)

Related Questions