Los
Los

Reputation: 381

Set random labels for images in tf.data.Dataset

I have a tf data dataset of images with a signature as seen below :

<_UnbatchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

All the labels in this dataset are 0. What I would like to do is change each of these labels to a random number from 0 to 3.
My code is :

def change_label(image, label):
   return image, np.random.randint(0, 4)

dataset = dataset.map(change_label)

This however just assigns 1 to all images as a label. The strange this is that no matter how many times i run it it still assigns 1 to these images.
Any ideas?

Upvotes: 1

Views: 563

Answers (3)

AloneTogether
AloneTogether

Reputation: 26708

The problem is that using dataset.map runs all operations in graph mode and random numbers generated by numpy are not tracked by tensorflow and are therefore deterministic. Random tensorflow tensors, on the other hand, will be tracked. So try something like this:

import tensorflow as tf

images = tf.random.normal((50, 128, 128, 3))
dataset = tf.data.Dataset.from_tensor_slices((images))

dataset = dataset.map(lambda x: (x, tf.random.uniform((), maxval=4, dtype=tf.int32))).batch(2)

for x, y in dataset.take(1):
  print(x.shape, y)
(2, 128, 128, 3) tf.Tensor([2 2], shape=(2,), dtype=int32)

Upvotes: 1

Gustasvs
Gustasvs

Reputation: 51

I'd say just iterate over the dataset in a for loop:

def change_labels(dataset):
for i in range(len(dataset)):
    dataset[i][1] = random.choice([1, 2, 3])  # i would guess that dataset has image on index 0 and label on index 1
return dataset

dataset = change_labels(dataset)

Upvotes: 1

I&#39;mahdi
I&#39;mahdi

Reputation: 24049

You need to use tf.experimental.numpy.random.randint.

import tensorflow as tf
def change_label(image, label):
    return image, tf.experimental.numpy.random.randint(0,4)

dataset = dataset.map(change_label)

for img,lbl in dataset.take(10):
    print(lbl)
# tf.Tensor(1, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(1, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)
# tf.Tensor(2, shape=(), dtype=int64)
# tf.Tensor(3, shape=(), dtype=int64)

Generate random dataset for using: (At first, I set all labels zero like your question.)

import numpy as np
x = np.random.rand(100, 128, 128, 3)
y = np.random.randint(0,1, size=100)

dataset = tf.data.Dataset.from_tensor_slices((x,y))

for img,lbl in dataset.take(10):
    print(lbl)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)
# tf.Tensor(0, shape=(), dtype=int64)

Upvotes: 2

Related Questions