Enable multiprocessing on pytorch XLA for TPU VM

Question

I'm fairly new to this and have little to no experience. I had a notebook running PyTorch that I wanted to run a Google Cloud TPU VM. Machine specs:

- Ubuntu
- TPU v2-8
- pt-2.0

I should have 8 cores. Correct me if I'm wrong.

So, I followed the guidelines for making the notebook TPU-compatible via XLA. I did the following:

Importing libraries and setting-up device

os.environ['PJRT_DEVICE'] = 'TPU'
import torch_xla.core.xla_model as xm
import torch_xla.distributed.parallel_loader as pl
import torch_xla.distributed.xla_multiprocessing as xmp
device = xm.xla_device()

print(device)

It printed xla:0.

Models was sent to the device via the model.to(device) function.
Dataloaders was wrapped in a pl.MpDeviceLoader(loader, device)
Optimizer is stepped via the xm.optimizer_step(optimizer) function
As far as I know, this is how to enable multiprocessing:

def _mp_fn(index):
     # models creation
     # data preparation
     # training loop 
if __name__ == '__main__':
  xmp.spawn(_mp_fn, args=())

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I could be all totally wrong about this. So, I'm sorry for that. If you need a further look at the code, I can share the notebook if you want. When I follow the guidelines for single-core processing, and I don't use xmp.spawn, I get 1.2 iterations/sec which can be significantly increased if used all cores.

Enable multiprocessing on pytorch XLA for TPU VM

Answers (1)

Related Questions