How to add/change names of components to an existing Tensorflow Dataset object?

Question

From the Tensorflow dataset guide it says

It is often convenient to give names to each component of an element, for example if they represent different features of a training example. In addition to tuples, you can use collections.namedtuple or a dictionary mapping strings to tensors to represent a single element of a Dataset.

dataset = tf.data.Dataset.from_tensor_slices(
   {"a": tf.random_uniform([4]),
    "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types)  # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes)  # ==> "{'a': (), 'b': (100,)}"

https://www.tensorflow.org/guide/datasets

And this is very useful in Keras. If you pass a dataset object to model.fit, the names of the components can be used to match the inputs of your Keras model. Example:

image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, activation='softmax', name='class_output')(x)

model = keras.Model(inputs=[image_input, timeseries_input],
                    outputs=[score_output, class_output])

train_dataset = tf.data.Dataset.from_tensor_slices(
    ({'img_input': img_data, 'ts_input': ts_data},
     {'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

So it would be useful for look up, add, and change names to components in tf dataset objects. What is the best way to go about doing these tasks?

Allohvk · Accepted Answer

While the accepted answer is good for changing names of (existing)components, it does not talk about 'addition'. This can be done as follows:

y_dataset = x_dataset.map(fn1)

where you can define fn1 as you want

@tf.function
def fn1(x):
    ##use x to derive additional columns u want. Set the shape as well
    y = {}
    y.update(x)
    y['new1'] = new1
    y['new2'] = new2
    return y

How to add/change names of components to an existing Tensorflow Dataset object?

Answers (2)

Related Questions