tensorflow multi-gpu problem with MirroredStrategy (UnBatchDataset?)

Question

Thank you for your interest in the question. (Please excuse any awkward phrasing, as I am using a translation.)

My main question is how to use multi-gpu for custom dataset in tensorflow.

I using

tensorflow : 2.13.0
tensorflow-hub : 0.14.0
Cuda : 11.8
2 GPUs

First, I attemped this:

def build_ds(**params):
    ds = tf.keras.preprocessing.image_dataset_from_directory(
            dataset_path,
            validation_split=0.0,
            image_size=(img_size,img_size),
            batch_size=1,
            label_mode='categorical',
        )
    size_ = ds.cardinality().numpy()
    ds = ds.unbatch().batch(batch_size)
    ds = ds.repeat()
    
    normalization_layer = tf.keras.layers.Rescaling(1. / 255)
    ...
    return ds, size_

def get_model(**model_params):
    model = tf.keras.Sequential([
            ...
            hub.KerasLayer(**params)
            ... ])
    model.build(**params)
    return model
    

---main
mirrored_strategy = tf.distribute.MirroredStrategy()
train_ds, t_size = build_ds(train_data_path, aug=True)
valid_ds, v_size = build_ds(valid_data_path, aug=False)

with mirrored_strategy.scope():
    model = get_model(**model_params)
    model.compile(...)

steps_per_epoch = t_size // BATCH_SIZE
validation_steps = v_size // BATCH_SIZE

hist = model.fit(train_ds,
                    steps_per_epoch=steps_per_epoch,
                    epochs=EPOCH,
                    verbose=1,
                    callbacks=[...],
                    validation_data=valid_ds,
                    validation_steps=validation_steps,
                    use_multiprocessing = True,
                )

but when run this code, following output:

W tensorflow/core/grappler/optimizers/data/auto_shard.cc:786] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Did not find a shardable source, walked to a node which is not a dataset: name: "UnbatchDataset/_20

(When I ran this code on a single GPU without MirroredStrategy, it worked correctly.)

Q1 : What is UnBatchDataset ? How can I solve it?
Q2 : When I placed the code within 'with strategy', the warning message disappeared, but the training still did not proceed.

with mirrored_strategy.scope():
    model = get_model(**model_params)
    model.compile(...)

    options = tf.data.Options()
    options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.DATA
    train_ds = train_ds.with_options(options)
    valid_ds = valid_ds.with_options(options)

steps_per_epoch = t_size // BATCH_SIZE
validation_steps = v_size // BATCH_SIZE

hist = model.fit(train_ds,
                    steps_per_epoch=steps_per_epoch,
                    epochs=EPOCH,
                    verbose=1,
                    callbacks=[...],
                    validation_data=valid_ds,
                    validation_steps=validation_steps,
                    use_multiprocessing = True,
                )

I need some help. Thank you!

Try:

options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.DATA
train_ds = train_ds.with_options(options)
valid_ds = valid_ds.with_options(options)

train_ds = mirrored_strategy.experimental_distribute_dataset(train_ds)
valid_ds = mirrored_strategy.experimental_distribute_dataset(valid_ds)

Expectation: I wanna see training progress bar.

Result:

W tensorflow/core/grappler/optimizers/data/auto_shard.cc:786] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Did not find a shardable source, walked to a node which is not a dataset: name: "UnbatchDataset/_20

tensorflow multi-gpu problem with MirroredStrategy (UnBatchDataset?)

Answers (0)

Related Questions