MarcoM
MarcoM

Reputation: 1204

tf.data: create a Dataset from a list of Numpy arrays of different shape

I have a list of Numpy arrays of different shape.

I need to create a Dataset, so that each time an element is requested I get a tensor with the shape and values of the given Numpy array.

How can I achieve this?

This is NOT working:

dataset = tf.data.Dataset.from_tensor_slices(list_of_arrays)

since you get, as expected:

Can't convert non-rectangular Python sequence to Tensor.

p.s. I know that it will not be possible to batch a Dataset with elements of different shapes.

Upvotes: 1

Views: 1577

Answers (2)

Timbus Calin
Timbus Calin

Reputation: 14983

Have you tried converting initially to a ragged tensor?

tensor_with_from_dimensions = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])

Bear in mind that:

All scalar values in pylist must have the same nesting depth K, and the returned RaggedTensor will have rank K. If pylist contains no scalar values, then K is one greater than the maximum depth of empty lists in pylist. All scalar values in pylist must be compatible with dtype.

You can read more about it here : https://www.tensorflow.org/api_docs/python/tf/ragged/constant

Upvotes: 2

MarcoM
MarcoM

Reputation: 1204

I've accepted the solution from Timbus Calin since is the more compact, but I've found another way that provides a lot of flexibility and its worth mentioning here.

Its based on generators:

def create_generator(list_of_arrays):
    for i in list_of_arrays:
        yield i

dataset = tf.data.Dataset.from_generator(lambda: create_generator(list_of_arrays),output_types= tf.float32, output_shapes=(None,4))

Upvotes: 3

Related Questions