Reputation: 11686
I am looking at the recurrent neural net walkthrough here. In the tutorial they have a line item that does:
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
However, if you're doing a sequential build, is that still necessary? Looking at the sequential documentation, a shuffle is performed automatically? If not, why is it done here? Is there a simple numerical example of the effect?
Upvotes: 0
Views: 262
Reputation: 4289
The tf.keras.models.Sequential
can also batch and shuffle the data, similar to what tf.data.Dataset
does. These preprocessing features are provided in Sequential because it can take up data in several types like NumPy arrays, tf.data.Dataset
, dict
object as well as a tf.keras.utils.Sequence
.
The tf.data.Dataset
API provides these features because the API is consistent with other TensorFlow APIs ( in which Keras is not involved ).
I don't think the shuffling and batching need to be done twice. You may remove the if you wish, it will not affect the model's training. I think the author wanted to use tf.data.Dataset
to get the data into a Keras model. dataset.shuffle( ... ).batch( ... )
have been colloquial with the Dataset
.
Upvotes: 1