CUDA out of memory error when reloading Pytorch model

Question

Common pytorch error here, but I'm seeing it under a unique circumstance: when reloading a model, I get a CUDA: Out of Memory error, even though I haven't yet placed the model on the GPU.

model = model.load_state_dict(torch.load(model_file_path))
optimizer = optimizer.load_state_dict(torch.load(optimizer_file_path))
# Error happens here ^, before I send the model to the device.
model = model.to(device_id)

Jacob Stern · Accepted Answer

The issue is that I was trying to load to a new GPU (cuda:2) but originally saved the model and optimizer from a different GPU (cuda:0). So even though I didn't explicitly tell it to reload to the previous GPU, the default behavior is to reload to the original GPU (which happened to be occupied).

Adding map_location=device_id to each torch.load call fixed the problem:

model.to(device_id)
model = model.load_state_dict(torch.load(model_file_path, map_location=device_id))
optimizer = optimizer.load_state_dict(torch.load(optimizer_file_path, map_location=device_id))

CUDA out of memory error when reloading Pytorch model

Answers (1)

Related Questions