ttcong194
ttcong194

Reputation: 21

Can anyone explain how the function of shuffle in tf.dataset work?

I can't find out how the function of shuffle in tf.dataset work. I try to see output to guess what happen inside.

dataset = tf.data.Dataset.range(6);
dataset = dataset.shuffle(buffer_size =1).batch(6)
for item in dataset:
  print(item)

Output:

tf.Tensor([0 1 2 3 4 5], shape=(6,), dtype=int64)

===> As you see, the shuffle doesn't work when buffer_size =1 But when I change buffer_size = 2 the list change the order but the first item only in 0 or 1 although I run 100 time again.

Anyone in group can explain the role of buffer_size. I read the document of tensorflow at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle

In my thought, when set buffer_size = 1, as document said, maybe i can replace element in buffer_size. But the output I get make me confused.

can anyone run into the same problem ?

Upvotes: 1

Views: 312

Answers (1)

ttcong194
ttcong194

Reputation: 21

now i completely find out the problem.

I can explain step by step in my case:

  1. I have list [0,1,2,3,4,5]
  2. Because I create buffer_size = 2. Firstly, this buffer will be filled up with 0 and 1.
  3. When I use batch or take method to get randomly one item from this buffer. For example, I take value 0. Now the position of 0 in buffer will be replaced the next value 2 in data source (list in step 1) to keep buffer full. Now in buffer, I have 2 and 1.
  4. Next I get randomly 2 or 1. So optional value group of 2 first value is [0,1] or [0,2] or [1,2].

Upvotes: 1

Related Questions