Random Certainty
Random Certainty

Reputation: 475

How to add randomness in each iteration of tensorflow DataSet?

I'm using the Estimator API. I want to process each minibatch (or each element) dynamically in each iteration through the DataSet.

For example, I would like to add random noise to each element in the dataset every time it is batched and fed into the model_fn.

dataset.map() seems to get called only once, and subsequent passes via dataset.repeat() are static. This is what I tried:

import tensorflow as tf
import numpy as np
import random 

dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(lambda x: x + random.uniform(0, 1)).repeat(2)
for next_element in dx:
    print(next_element.numpy())

Output

10.426203
20.426203
30.426203
10.426203
20.426203
30.426203

One way to do this is to add randomness to the raw data which are read by input_fn, but then data won't change between different epochs.

Upvotes: 1

Views: 2127

Answers (3)

Prasad
Prasad

Reputation: 6034

There is some problem with your understanding on map function. Map function applies the logic inside it separately to every element but the dataset is created only once. random.uniform(0, 1) generates a random float value. So when you make use of random.uniform() inside the map function, the dataset is created with a fixed numeric float constant. Now, each time this same numeric float will get added to each element inside the map function.

To overcome that problem, you have to make use of tf.random.uniform() which will chain a tensor into the dataset. This tensor will get evaluated every time the map function is applied on each element generating different random values though the dataset is created only once.

So your code should be:

import tensorflow as tf
import numpy as np
import random

dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(lambda x: x + tf.random.uniform([], 0, 1)).repeat(2)
for next_element in dx:
    print(next_element.numpy())

Upvotes: 2

Srihari Humbarwadi
Srihari Humbarwadi

Reputation: 2632

This bit of code should give you the desired outcome

import tensorflow as tf
import numpy as np
import random 

def add_noise(x):
    noise = tf.random.uniform(shape=(), minval=0, maxval=1)
    return x + noise

dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(add_noise).repeat(2)
for next_element in dx:
    print(next_element.numpy())
10.931375
20.01276
30.051556
10.825275
20.22412
30.7365

Upvotes: 1

zihaozhihao
zihaozhihao

Reputation: 4475

One workaround I can think of is generating the noise first and zipping with repeated dataset. Maybe there are some other better solutions.

import tensorflow as tf
import numpy as np
import random 

dx = tf.data.Dataset.from_tensor_slices(np.array([10.0, 20.0, 30.0]))
noise = tf.data.Dataset.from_tensor_slices(np.random.randn(6))
dx = dx.repeat(2)
new_dx = tf.data.Dataset.zip((dx, noise))
for next_element in new_dx:
    data = next_element[0]
    ns = next_element[1]
    input_ = data+ns
    print(input_.numpy())

# 10.969622987669728
# 19.77313649149436
# 30.09365081990082
# 9.950256200151752
# 19.36040356387037
# 29.6192768988015

Upvotes: 0

Related Questions