Reputation: 6197
From the Tensorflow dataset guide it says
It is often convenient to give names to each component of an element, for example if they represent different features of a training example. In addition to tuples, you can use collections.namedtuple or a dictionary mapping strings to tensors to represent a single element of a Dataset.
dataset = tf.data.Dataset.from_tensor_slices(
{"a": tf.random_uniform([4]),
"b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types) # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes) # ==> "{'a': (), 'b': (100,)}"
https://www.tensorflow.org/guide/datasets
And this is very useful in Keras. If you pass a dataset object to model.fit
, the names of the components can be used to match the inputs of your Keras model. Example:
image_input = keras.Input(shape=(32, 32, 3), name='img_input')
timeseries_input = keras.Input(shape=(None, 10), name='ts_input')
x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)
x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)
x = layers.concatenate([x1, x2])
score_output = layers.Dense(1, name='score_output')(x)
class_output = layers.Dense(5, activation='softmax', name='class_output')(x)
model = keras.Model(inputs=[image_input, timeseries_input],
outputs=[score_output, class_output])
train_dataset = tf.data.Dataset.from_tensor_slices(
({'img_input': img_data, 'ts_input': ts_data},
{'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)
model.fit(train_dataset, epochs=3)
So it would be useful for look up, add, and change names to components in tf dataset objects. What is the best way to go about doing these tasks?
Upvotes: 6
Views: 3438
Reputation: 1356
While the accepted answer is good for changing names of (existing)components, it does not talk about 'addition'. This can be done as follows:
y_dataset = x_dataset.map(fn1)
where you can define fn1 as you want
@tf.function
def fn1(x):
##use x to derive additional columns u want. Set the shape as well
y = {}
y.update(x)
y['new1'] = new1
y['new2'] = new2
return y
Upvotes: 1
Reputation: 24581
You can use map
to bring modifications to your dataset, if that is what you are looking for. For example, to transform a plain tuple
output to a dict
with meaningful names,
import tensorflow as tf
# dummy example
ds_ori = tf.data.Dataset.zip((tf.data.Dataset.range(0, 10), tf.data.Dataset.range(10, 20)))
ds_renamed = ds_ori.map(lambda x, y: {'input': x, 'output': y})
batch_ori = ds_ori.make_one_shot_iterator().get_next()
batch_renamed = ds_renamed.make_one_shot_iterator().get_next()
with tf.Session() as sess:
print(sess.run(batch_ori))
print(sess.run(batch_renamed))
# (0, 10)
# {'input': 0, 'output': 10}
Upvotes: 6