Jack Shi
Jack Shi

Reputation: 23

Tensorflow CentralStorageStrategy

The tf.distribute.experimentalCentralStorageStrategy specifies that Variables are not mirrored, instead, they are placed on CPU and ops are replicated across all GPUs.

If I have a really big model that does not fit on any single GPU, could this be a solution since variables are stored on CPU? I know that there will be networking overhead and that's fine.

This Official TF Tutorial on Youtube states that this could be used to handle "large embeddings" that would not fit on one GPU. Could this also be the case for large variables and activations?

In the official documentation, it states that "if there is only one GPU, all variables and operations will be placed on that GPU." If I only used 1 GPU, it seems that CentralStorageStrategy would be automatically disabled even though storing large variables (embeddings for example) on the CPU instead of GPU could be very valuable since there might not exist a GPU that has enough memory to fit it on device. Is this a design oversight or intended behavior?

Upvotes: 2

Views: 336

Answers (0)

Related Questions