Reputation: 20101
I'm experimenting with TensorFlow
2.0 alpha and I've found that it works as expected when using Numpy
arrays but when tf.data.Dataset
is used, an input dimension error appears. I'm using the iris dataset as the simplest example to demonstrate this:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import tensorflow as tf
from tensorflow.python import keras
iris = datasets.load_iris()
scl = StandardScaler()
ohe = OneHotEncoder(categories='auto')
data_norm = scl.fit_transform(iris.data)
data_target = ohe.fit_transform(iris.target.reshape(-1,1)).toarray()
train_data, val_data, train_target, val_target = train_test_split(data_norm, data_target, test_size=0.1)
train_data, test_data, train_target, test_target = train_test_split(train_data, train_target, test_size=0.2)
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset.batch(32)
test_dataset = tf.data.Dataset.from_tensor_slices((test_data, test_target))
test_dataset.batch(32)
val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset.batch(32)
mdl = keras.Sequential([
keras.layers.Dense(16, input_dim=4, activation='relu'),
keras.layers.Dense(8, activation='relu'),
keras.layers.Dense(8, activation='relu'),
keras.layers.Dense(3, activation='sigmoid')]
)
mdl.compile(
optimizer=keras.optimizers.Adam(0.01),
loss=keras.losses.categorical_crossentropy,
metrics=[keras.metrics.categorical_accuracy]
)
history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset)
and I get the following error:
ValueError: Error when checking input: expected dense_16_input to have shape (4,) but got array with shape (1,)
assuming that the dataset has only one dimension. If I pass input_dim=1 I get a different error:
InvalidArgumentError: Incompatible shapes: [3] vs. [4]
[[{{node metrics_5/categorical_accuracy/Equal}}]] [Op:__inference_keras_scratch_graph_8223]
What is the proper way to use tf.data.Dataset
on a Keras
model with Tensorflow 2.0
?
Upvotes: 1
Views: 864
Reputation: 442
A few changes should fix your code. The batch()
dataset transformation does not occur in-place, so you need to return the new datasets. Secondly, you should also add a repeat()
transformation, so that the dataset continues to output examples after all of the data has been seen.
...
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset = train_dataset.batch(32)
train_dataset = train_dataset.repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset = val_dataset.batch(32)
val_dataset = val_dataset.repeat()
...
You also need to add the argument for validation_steps
in the model.fit()
function:
history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset, validation_steps=1)
For your own data, you may need to adjust the batch_size
for the validation dataset and validation_steps
, such that the validation data is only cycled once during each step.
Upvotes: 2