Reputation: 67
Using the code below, I would like to ask a few questions about what exactly is happening underneath.
dataset = tf.data.TFRecordDataset(filepath)
dataset = dataset.map(parse_function, num_parallel_calls=4)
dataset = dataset.repeat()
dataset = dataset.shuffle(1024)
dataset = dataset.batch(16)
iterator = dataset.make_one_shot_iterator()
1.dataset.map(parse_function, num_parallel_calls=4)
- How many records are we loading here ? How much will fit in the memory or some fixed number ?
2.dataset = dataset.repeat()
- What exactly do we repeat ? Currently loaded piece of data from point .1 ? If so, does it mean that we will not load the others anymore ?
3.How exactly does shuffle work?
4.Can we use repeat, shuffle and batch before map and work on file paths instead of files alone ?
Upvotes: 0
Views: 1336
Reputation: 1804
repeat
and shuffle
together here.Upvotes: 1
Reputation: 4533
dataset.repeat()
without specified epoch value will repeat dataset indefinitely.tf.data.TFRecordDataset
expects filenames as input. Generally, preferred order is
dataset = dataset.shuffle(shuffle_buffer).repeat()
dataset = dataset.batch(batch_size)
dataset = dataset.map(map_func)
Take a look at https://www.tensorflow.org/guide/performance/datasets
Upvotes: 0