mon
mon

Reputation: 22388

Tensorflow - how to create a Dataset which is an array of tuples

Question

Dataset can be a collection of tuples with different types. I can create a dataset from a tuple.

tf.data.Dataset.from_tensors(
    ([1, 2, 3], 'A')
)
-----
<TensorDataset shapes: ((3,), ()), types: (tf.int32, tf.string)>

How can I create a dataset from an array of tuples?

tf.data.Dataset.from_tensors(
    [
        ([1, 2, 3], 'A'), 
        ([4, 5, 6], 'B')
    ]
)
----
ValueError: Can't convert non-rectangular Python sequence to Tensor.

Tensorflow Dataset IMDB review dataset is an example of an array of tuples of different types, so there should be a way.

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

imdb, info = tfds.load("imdb_reviews", with_info=True, as_supervised=True)
train_data, test_data = imdb['train'], imdb['test']

print(train_data.take(2))
for text, label in train_data.take(2).as_numpy_iterator():
  print("{}, {}".format(text[0:64], label))
----
<TakeDataset shapes: ((), ()), types: (tf.string, tf.int64)>
b'Being a fan of silent films, I looked forward to seeing this pic', 0
b"I haven't seen this film in years so my knowledge is a little ru", 1

Upvotes: 1

Views: 1408

Answers (1)

Nicolas Gervais
Nicolas Gervais

Reputation: 36714

It works in the IMDB dataset because they are separate features. Your example would also work if the features are separated, i.e., as multi-input.

import numpy as np
import tensorflow as tf

input_1 = np.array(([1, 2, 3], [4, 5, 6]))
input_2 = np.array((['A'], ['B']))

tf.data.Dataset.from_tensor_slices((input_1, input_2))
<TensorSliceDataset shapes: ((3,), (1,)), types: (tf.int32, tf.string)>

For example:

ds = tf.data.Dataset.from_tensor_slices((input_1, input_2))

next(iter(ds))
(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3])>,
 <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'A'], dtype=object)>)

Upvotes: 4

Related Questions