How to use Model Parallelism with a custom Tensorflow 2.0 model on TPUs?

Question

To replicate Multimodal Few-Shot Learning with Frozen Language Models, I am trying to train a ~7B parameter subclassed TF2 model on a TPUv3-32. Out of the 7B parameters, roughly 6B parameters are frozen.

I want to use model and data parallelism to train it as efficiently as possible. As far as I know, MeshTensorflow can only be used for models written in TF1.

I tried using experimental_device_assignment from TPUStrategy but it was placing all the variables only on the 1st(0th) core of the TPU which quickly ran out of memory.

On a TPUv3-8, I tried to keep computation_shape = [2, 2, 1, 2] and [1, 1, 1, 2] and num_replicas = 1 but it didn't work.

I am also open to using GPUs to train it.

How to use Model Parallelism with a custom Tensorflow 2.0 model on TPUs?

Answers (1)

Related Questions