Amarnath R
Amarnath R

Reputation: 1003

Tensorflow - Keras: Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset

While training a model in keras / tensorflow:

The code snippet:

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

I got the below error / warning:

Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
    2020-12-16 17:12:20.885741: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
    2020-12-16 17:12:20.905570: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3593105000 Hz
    Epoch 1/40

Any help is appreciated.

Upvotes: 18

Views: 14579

Answers (1)

Graham501617
Graham501617

Reputation: 469

The error message here has newly arrived in tensorflow 2.4.0. While the error hints at a solution, it presupposes that your data is an object of the type tf.data.Dataset. There was previously no strict requirement to have your input data in this form (e.g. numpy arrays were fine), except now it seems to be a requirement with the distribute strategies (e.g tf.distribute.MirroredStrategy()). In any event, there does not appear to be a way to avoid tensorflow's latest console-vomit without wrapping your data in a Dataset object..

So supposing your current code looks something like this:

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
    model = ... # awesome model definition

train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)

batch_size = 32
model.fit(train_x, train_y, batch_size=batch_size, validation_data=(val_x, val_y))

It needs to be changed to look like this:

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
    model = ... # awesome model definition

train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)

# Wrap data in Dataset objects.
train_data = tf.data.Dataset.from_tensor_slices((train_x, train_y))
val_data = tf.data.Dataset.from_tensor_slices((val_x, val_y))

# The batch size must now be set on the Dataset objects.
batch_size = 32
train_data = train_data.batch(batch_size)
val_data = val_data.batch(batch_size)

# Disable AutoShard.
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_data = train_data.with_options(options)
val_data = val_data.with_options(options)

model.fit(train_data, validation_data=val_data)

Note that if you don't set the batch size on the Dataset object, you'll get a cryptic error like this:

File "/usr/lib/python3.8/site-packages/tensorflow/python/data/experimental/ops/distribute.py", line 496, in get_static_batch_dim
    return output_shape.dims[0].value
IndexError: list index out of range

Upvotes: 28

Related Questions