hanugm
hanugm

Reputation: 1387

Is it possible to store some tensors on CPU and other on GPU for training neural network in PyTorch?

I designed a neural network in PyTorch, which is demanding a lot of GPU memory or else runs with a very small batch size.

The GPU Runtime error is causing due to three lines of code, which stores two new tensors and does some operations.

I don't want to run my code with a small batch size. So, I want to execute those three lines of code (and hence storing those new tensors) on CPU and remaining all other code on GPU as usual.

Is it possible to do?

Upvotes: 1

Views: 1727

Answers (1)

Shai
Shai

Reputation: 114796

It is possible.
You can use the command .to(device=torch.device('cpu') to move the relevant tensors from GPU to CPU, and back to GPU afterwards:

orig_device = a.device  # store the device from which the tensor originated
# move tensors a and b to CPU
a = a.to(device=torch.device('cpu')) 
b = b.to(device=torch.device('cpu'))
# do some operation on a and b - it will be executed on CPU
res = torch.bmm(a, b)
# put the result back to GPU
res = res.to(device=orig_device)

A few notes:

  1. Moving tensors between devices, or between GPU and CPU is not an unusual event. The term used to describe it is "model parallel" - you can google it for more details and examples.
  2. Note that .to() operation is not an "in place" operation.
  3. Moving tensor back and forth between GPU and CPU takes time. It might not be worthwhile using "model parallelism" of this type in this case. If you are struggling with GPU space, you might consider gradient accumulation instead

Upvotes: 1

Related Questions