Multiple input for tf.data api with generators

Question

I am struggeling a little bit with the tf.data.dataset API when I try to have multiple inputs for LSTM, that is for each feature a vector of length n (steps in time serie) and with, let us day, 5 features. Thus I have a list of 5 vectors with length, let us say, n=3.

For example I have a generator which yields in every step a data with the following structure:

      [
       array( 
        [
         [5.00000000e-01, 5.00000000e-01, 5.00000000e-01],
         [9.00000000e+00, 9.00000000e+00, 9.00000000e+00],
         [7.00000000e+00, 9.00000000e+00, 1.00000000e+01],
         [6.30841636e-03, 4.22776321e-02, 1.49106372e-02],
         [4.00000000e+00, 1.00000000e+01, 2.20000000e+01]
        ]), 
       array(
        [
         [ 9,  9,  9],
         [13, 13, 13]
        ]
       )
      ]

and when I try to put it into the api with the code line:

tf.data.Dataset.from_generator(
            generator=lambda: generator,
            output_types=(
                (
                    (tf.float32, tf.float32, tf.float32),
                    (tf.int32, tf.int32, tf.int32),
                    (tf.int32, tf.int32, tf.int32),
                    (tf.float32, tf.float32, tf.float32),
                    (tf.int32, tf.int32, tf.int32)
                ),
                (
                    (tf.int32, tf.int32, tf.int32),
                    (tf.int32, tf.int32, tf.int32)
                )
            )
        )

I get the error:

TypeError: generator yielded an element that did not match the expected structure. The expected structure was .... but the yielded element was ... .

What am I missing? How to write the correct output_shape? Or is it not possible to give the generator for tf.data a nested structure? How to handle multiple inputs and outputs with tf.data.dataset.from_generator?

Thanks in advance for any help.

woebs · Accepted Answer

First of all, it seems like from_generator cannot handle a generator that yields lists of arrays as this results in the following exception:

TypeError: unhashable type: 'list'

Simply switching to a generator that yields tuples of arrays seems to fix this error.

Next, according to the documentation, as output_types you should provide a nested structure of tf.DType objects corresponding to each component of an element yielded by the generator.

In this case, the elements your generator is yielding are tuples of two arrays. You should therefore provide a nested structure of tf.DType objects corresponding to each component/array. Or in other words, as output_types you should provide a tuple containing two tf.DType objects, indicating the desired type of each array (instead of trying to indicate the desired type of each value in each array).

The following code can give you an idea of how to properly use from_generator:

import numpy as np
import tensorflow as tf


def generator():
    for _ in range(10):
        yield (np.random.rand(5, 3), np.random.rand(2, 3))


dataset = tf.data.Dataset.from_generator(generator,
                                         output_types=(tf.float32, tf.float32))

Multiple input for tf.data api with generators

Answers (1)

Related Questions