SantoshGupta7
SantoshGupta7

Reputation: 6197

How to add/change names of components to an existing Tensorflow Dataset object?

From the Tensorflow dataset guide it says

It is often convenient to give names to each component of an element, for example if they represent different features of a training example. In addition to tuples, you can use collections.namedtuple or a dictionary mapping strings to tensors to represent a single element of a Dataset.

dataset = tf.data.Dataset.from_tensor_slices(
   {"a": tf.random_uniform([4]),
    "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types)  # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes)  # ==> "{'a': (), 'b': (100,)}"

https://www.tensorflow.org/guide/datasets

And this is very useful in Keras. If you pass a dataset object to model.fit, the names of the components can be used to match the inputs of your Keras model. Example:

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, activation='softmax', name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
                    outputs=[score_output, class_output])

train_dataset = tf.data.Dataset.from_tensor_slices(
    ({'img_input': img_data, 'ts_input': ts_data},
     {'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

So it would be useful for look up, add, and change names to components in tf dataset objects. What is the best way to go about doing these tasks?

Upvotes: 6

Views: 3438

Answers (2)

Allohvk
Allohvk

Reputation: 1356

While the accepted answer is good for changing names of (existing)components, it does not talk about 'addition'. This can be done as follows:

y_dataset = x_dataset.map(fn1)

where you can define fn1 as you want

@tf.function
def fn1(x):
    ##use x to derive additional columns u want. Set the shape as well
    y = {}
    y.update(x)
    y['new1'] = new1
    y['new2'] = new2
    return y

Upvotes: 1

P-Gn
P-Gn

Reputation: 24581

You can use map to bring modifications to your dataset, if that is what you are looking for. For example, to transform a plain tuple output to a dict with meaningful names,

import tensorflow as tf

# dummy example
ds_ori = tf.data.Dataset.zip((tf.data.Dataset.range(0, 10), tf.data.Dataset.range(10, 20)))
ds_renamed = ds_ori.map(lambda x, y: {'input': x, 'output': y})

batch_ori = ds_ori.make_one_shot_iterator().get_next()
batch_renamed = ds_renamed.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  print(sess.run(batch_ori))
  print(sess.run(batch_renamed))
  # (0, 10)
  # {'input': 0, 'output': 10}

Upvotes: 6

Related Questions