Reputation: 475
I'm using the Estimator API. I want to process each minibatch (or each element) dynamically in each iteration through the DataSet
.
For example, I would like to add random noise to each element in the dataset every time it is batched and fed into the model_fn
.
dataset.map()
seems to get called only once, and subsequent passes via dataset.repeat()
are static. This is what I tried:
import tensorflow as tf
import numpy as np
import random
dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(lambda x: x + random.uniform(0, 1)).repeat(2)
for next_element in dx:
print(next_element.numpy())
Output
10.426203
20.426203
30.426203
10.426203
20.426203
30.426203
One way to do this is to add randomness to the raw data which are read by input_fn
, but then data won't change between different epochs.
Upvotes: 1
Views: 2127
Reputation: 6034
There is some problem with your understanding on map
function. Map function applies the logic inside it separately to every element but the dataset is created only once. random.uniform(0, 1)
generates a random float value. So when you make use of random.uniform()
inside the map function, the dataset is created with a fixed numeric float constant. Now, each time this same numeric float will get added to each element inside the map function.
To overcome that problem, you have to make use of tf.random.uniform()
which will chain a tensor into the dataset. This tensor will get evaluated every time the map
function is applied on each element generating different random values though the dataset is created only once.
So your code should be:
import tensorflow as tf
import numpy as np
import random
dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(lambda x: x + tf.random.uniform([], 0, 1)).repeat(2)
for next_element in dx:
print(next_element.numpy())
Upvotes: 2
Reputation: 2632
This bit of code should give you the desired outcome
import tensorflow as tf
import numpy as np
import random
def add_noise(x):
noise = tf.random.uniform(shape=(), minval=0, maxval=1)
return x + noise
dx = tf.data.Dataset.from_tensor_slices([10.0, 20.0, 30.0])
dx = dx.map(add_noise).repeat(2)
for next_element in dx:
print(next_element.numpy())
10.931375
20.01276
30.051556
10.825275
20.22412
30.7365
Upvotes: 1
Reputation: 4475
One workaround I can think of is generating the noise first and zipping with repeated dataset. Maybe there are some other better solutions.
import tensorflow as tf
import numpy as np
import random
dx = tf.data.Dataset.from_tensor_slices(np.array([10.0, 20.0, 30.0]))
noise = tf.data.Dataset.from_tensor_slices(np.random.randn(6))
dx = dx.repeat(2)
new_dx = tf.data.Dataset.zip((dx, noise))
for next_element in new_dx:
data = next_element[0]
ns = next_element[1]
input_ = data+ns
print(input_.numpy())
# 10.969622987669728
# 19.77313649149436
# 30.09365081990082
# 9.950256200151752
# 19.36040356387037
# 29.6192768988015
Upvotes: 0