Reputation: 1003
While training a model in keras / tensorflow:
The code snippet:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
I got the below error / warning:
Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-12-16 17:12:20.885741: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
2020-12-16 17:12:20.905570: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3593105000 Hz
Epoch 1/40
Any help is appreciated.
Upvotes: 18
Views: 14579
Reputation: 469
The error message here has newly arrived in tensorflow 2.4.0
. While the error hints at a solution, it presupposes that your data is an object of the type tf.data.Dataset
. There was previously no strict requirement to have your input data in this form (e.g. numpy arrays were fine), except now it seems to be a requirement with the distribute strategies (e.g tf.distribute.MirroredStrategy()
). In any event, there does not appear to be a way to avoid tensorflow's latest console-vomit without wrapping your data in a Dataset object..
So supposing your current code looks something like this:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
model = ... # awesome model definition
train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)
batch_size = 32
model.fit(train_x, train_y, batch_size=batch_size, validation_data=(val_x, val_y))
It needs to be changed to look like this:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
model = ... # awesome model definition
train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)
# Wrap data in Dataset objects.
train_data = tf.data.Dataset.from_tensor_slices((train_x, train_y))
val_data = tf.data.Dataset.from_tensor_slices((val_x, val_y))
# The batch size must now be set on the Dataset objects.
batch_size = 32
train_data = train_data.batch(batch_size)
val_data = val_data.batch(batch_size)
# Disable AutoShard.
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_data = train_data.with_options(options)
val_data = val_data.with_options(options)
model.fit(train_data, validation_data=val_data)
Note that if you don't set the batch size on the Dataset object, you'll get a cryptic error like this:
File "/usr/lib/python3.8/site-packages/tensorflow/python/data/experimental/ops/distribute.py", line 496, in get_static_batch_dim
return output_shape.dims[0].value
IndexError: list index out of range
Upvotes: 28