Volodymyr Frolov
Volodymyr Frolov

Reputation: 1306

How to batch CsvDataset correctly in Tensorflow 2.0?

I'm using tf.data.experimental.make_csv_dataset to create a dataset from a .csv file. I'm also using tf.keras.layers.DenseFeatures as an input layer of my model.

I'm struggling to create a DenseFeatures layer properly so that it is compatible with my dataset in the case when batch_size parameter of make_csv_dataset is not equal to 1 (in case if batch_size=1 my setup works as expected).

I create DenseFeatures layer using a list of tf.feature_column.numeric_column elements with shape=(my_batch_size,), but it seems like in this case for some reason the input layer expects [my_batch_size,my_batch_size] shape instead of [my_batch_size,1].

With my_batch_size=19 I'm getting the following error when trying to fit the model:

ValueError: Cannot reshape a tensor with 19 elements to shape [19,19] (361 elements) for 'MyModel/Input/MyColumn1/Reshape' (op: 'Reshape') with input shapes: [19,1], [2] and with input
tensors computed as partial shapes: input[1] = [19,19].

If I don't specify shape when creating numeric_column it doesn't work either. I'm getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  The second input must be a scalar, but it has shape [19]

which assumes that numeric_column expects a scalar but recieves the whole batch in one Tensor.

How do I create an input layer of DenseFeatures so that it accepts the dataset produced by make_csv_dataset(batch_size=my_batch_size)?

Upvotes: 2

Views: 739

Answers (1)

AlexisBRENON
AlexisBRENON

Reputation: 3099

From the tf.feature_column.numeric_column documentation:

shape: An iterable of integers specifies the shape of the Tensor. An integer can be given which means a single dimension Tensor with given width. The Tensor representing the column will have the shape of [batch_size] + shape.

This means that you must not pass the batch size to the shape argument: shape=().

Currently, with a batch size of 1, you get shape=(1,) that TF can handle thanks to broadcasting or something like that (dimensions of size 1 are easily added by TF if necessary), that's why it works.

Hope this can help. Provide more code if you want more help.

Upvotes: 1

Related Questions