Reputation: 253
I am struggeling a little bit with the tf.data.dataset API when I try to have multiple inputs for LSTM, that is for each feature a vector of length n (steps in time serie) and with, let us day, 5 features. Thus I have a list of 5 vectors with length, let us say, n=3.
For example I have a generator which yields in every step a data with the following structure:
[
array(
[
[5.00000000e-01, 5.00000000e-01, 5.00000000e-01],
[9.00000000e+00, 9.00000000e+00, 9.00000000e+00],
[7.00000000e+00, 9.00000000e+00, 1.00000000e+01],
[6.30841636e-03, 4.22776321e-02, 1.49106372e-02],
[4.00000000e+00, 1.00000000e+01, 2.20000000e+01]
]),
array(
[
[ 9, 9, 9],
[13, 13, 13]
]
)
]
and when I try to put it into the api with the code line:
tf.data.Dataset.from_generator(
generator=lambda: generator,
output_types=(
(
(tf.float32, tf.float32, tf.float32),
(tf.int32, tf.int32, tf.int32),
(tf.int32, tf.int32, tf.int32),
(tf.float32, tf.float32, tf.float32),
(tf.int32, tf.int32, tf.int32)
),
(
(tf.int32, tf.int32, tf.int32),
(tf.int32, tf.int32, tf.int32)
)
)
)
I get the error:
TypeError: generator
yielded an element that did not match the expected structure. The expected structure was .... but the yielded element was ... .
What am I missing? How to write the correct output_shape? Or is it not possible to give the generator for tf.data a nested structure? How to handle multiple inputs and outputs with tf.data.dataset.from_generator?
Thanks in advance for any help.
Upvotes: 0
Views: 2636
Reputation: 138
First of all, it seems like from_generator
cannot handle a generator that yields lists of arrays as this results in the following exception:
TypeError: unhashable type: 'list'
Simply switching to a generator that yields tuples of arrays seems to fix this error.
Next, according to the documentation, as output_types
you should provide a nested structure of tf.DType
objects corresponding to each component of an element yielded by the generator.
In this case, the elements your generator is yielding are tuples of two arrays. You should therefore provide a nested structure of tf.DType
objects corresponding to each component/array. Or in other words, as output_types
you should provide a tuple containing two tf.DType
objects, indicating the desired type of each array (instead of trying to indicate the desired type of each value in each array).
The following code can give you an idea of how to properly use from_generator
:
import numpy as np
import tensorflow as tf
def generator():
for _ in range(10):
yield (np.random.rand(5, 3), np.random.rand(2, 3))
dataset = tf.data.Dataset.from_generator(generator,
output_types=(tf.float32, tf.float32))
Upvotes: 4