Reputation: 73

How to get two tf.dataset from tf.data.Dataset.zip((images, labels))

I am working on the Python/tensorflow/mnist tutorial.

Since a few weeks using the orignal code from tensorflow web site i get the warning that the image dataset would soon be deprecated abd that i should use the following one : https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py

I load it it my code using :

from tensorflow.models.official.mnist import dataset
trainfile = dataset.train(data_dir)

Which returns :

tf.data.Dataset.zip((images, labels))

The issue is that I cannot find a,way to separate them in the following way for example :

  trainfile = dataset.train(data_dir)
  train_data= trainfile.images
  train_label= trainfile.label

But this clearly doesnot work because the attributrs images and label do not exist. trainfile is a tf.dataset.

Knowing that tf.dataset is made of int32 and float32 i tried :

  train_data = trainfile.map(lambda x,y : x.dtype == tf.float32)

But it returns and empty dataset.

I insist (but will be open mimded) in doing it this way (two complete batches of image and label) because this is how the tutorial works :

https://www.tensorflow.org/tutorials/estimators/cnn

I saw a lot of solution to get elements from datasets but nothing to go back from the zip operations that is done in the following code

tf.data.Dataset.zip((images, labels))

Thanks you in advance for your help.

Upvotes: 4

Answers (4)

manisar

Reputation: 123

TensorFlow's get_single_element() is finally around which can be used to unzip features and labels from the dataset.

This avoids the need of generating and using an iterator using `.map()`, `iter()` or `one_shot_iterator()` (which could be costly for big datasets).

get_single_element() returns a tensor (or a tuple or dict of tensors) encapsulating all the members of the dataset. We need to pass all the members of the dataset batched into a single element.

This can be used to get features as a tensor-array, or features and labels as a tuple or dictionary (of tensor-arrays) depending upon how the original dataset was created.

Check this answer on SO for an example that unpacks features and labels into a tuple of tensor-arrays.

Upvotes: 1

shivaraj karki

Reputation: 159

You can visualize images and find its associated labels

ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))

ds = ds.shuffle(buffer_size=10).batch(batch_size=batch_size)
iter = ds.make_one_shot_iterator()
next = iter.get_next()

def display(image, label):
# display image
   ...
   plt.imshow(image)
   ...

with tf.Session() as sess:
    try:
        while True:
             image, label = sess.run(next) 
             # image = numpy array (batch, image_size)
             # label = numpy array (batch, label)
        display(image[0], label[0]) #display first image in batch
    except:
        pass

Upvotes: 0

jubueche

Reputation: 793

I hope this helps:

inputs = tf.placeholder(tf.float32, shape=(None, 784), name='inputs')
outputs = tf.placeholder(tf.float32, shape=(None,), name='outputs')

#Prepare a tensorflow dataset
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))

ds = ds.shuffle(buffer_size=10, reshuffle_each_iteration=True).batch(batch_size=batch_size, drop_remainder=True).repeat()
iter = ds.make_one_shot_iterator()
next = iter.get_next()

inputs = next[0]
outputs = next[1]

Upvotes: 3

Alexandre Passos

Reputation: 5206

Instead of separating into two datasets, one for images and another for labels, it's best to make a single iterator which returns both the image and the label.

The reason why this is preferred is that it's a lot easier to ensure that you match each example with its label even after a complicated series of shuffles, reorderings, filterings, etc, as you might have in a nontrivial input pipeline.

Upvotes: 0

How to get two tf.dataset from tf.data.Dataset.zip((images, labels))

Answers (4)

This avoids the need of generating and using an iterator using .map(), iter() or one_shot_iterator() (which could be costly for big datasets).

Related Questions

This avoids the need of generating and using an iterator using `.map()`, `iter()` or `one_shot_iterator()` (which could be costly for big datasets).