jack
jack

Reputation: 923

What is the mechanism for tf.data.dataset.shuffle?

Say I have 100,000 examples in one tfrecord. I do a shuffle with buffer size 100, will it shuffle every 100 examples, after we consume all the examples and draw another 100 examples into shuffle. Or while we consume the training data, it will fill in later examples as well and draw uniformly from the shuffle, such that at some point of time, every example has some probability that it is in the buffer?

It think the latter makes more sense. Is it how the shuffle function implemented? I looked up but found source of explaining the mechanism.

Thanks.

Upvotes: 2

Views: 1920

Answers (1)

dennlinger
dennlinger

Reputation: 11420

Taken from here

The Dataset.shuffle()transformation randomly shuffles the input dataset using a similar algorithm to tf.RandomShuffleQueue: it maintains a fixed-size buffer and chooses the next element uniformly at random from that buffer.

You can find the definition of the operation here, and that directs to the ShuffleDataset.

Additionally, note that the shuffle operation also allows you to determine how the batch is drawn (i.e., whether, after one epoch, the dataset is pseudo-randomly shuffled in total).

Upvotes: 1

Related Questions