Victor
Victor

Reputation: 148

Difference between MirroredStrategy and CentralStorageStrategy

I read the documentation of both CentralStorageStrategy and MirroredStrategy, but can not understand the essence of difference between them.

In MirroredStrategy:

Each variable in the model is mirrored across all the replicas.

In CentralStorageStrategy:

Variables are not mirrored, instead they are placed on the CPU and operations are replicated across all local GPUs.

Source: https://www.tensorflow.org/guide/distributed_training

What does it mean in practice? What are use cases for the CentralStorageStrategy and how does the training work if variables are placed on the CPU in this strategy?

Upvotes: 8

Views: 1399

Answers (1)

isarandi
isarandi

Reputation: 3349

Consider one particular variable (call it "my_var") in your usual, single-GPU, non-distributed use case (e.g. a weight matrix of a convolutional layer).

If you use 4 GPUs, MirroredStrategy will create 4 variables instead of "my_var" variable, one on each GPU. However each variable will have the same value, because they are always updated in the same way. So the variable updates happen in sync on all the GPUs.

In case of the CentralStorageStrategy, only one variable is created for "my_var", in the host (CPU) memory. The updates only happen in one place.

Which one is better probably depends on the computer's topology and how fast CPU-GPU communication is compared with GPU-GPU. If the GPUs can communicate fast with each other, MirroredStrategy may be more efficient. But I'd benchmark it to be sure.

Upvotes: 8

Related Questions