Reputation: 1233
I have two GPUs and when I run
import torch
print('count: ', torch.cuda.device_count()) # prints count: 2
However, my model throws an error
RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1
on the line
torch.load(model_path, map_location='cuda:1')
What could cause it and how to fix it?
This issue is somehow linked to my Flask, because the training itself works with torch.load(model_path, map_location='cuda:1')
Upvotes: 2
Views: 1128
Reputation: 56
This is a known Flask-CUDA issue. Please run Flask with it with
print('count: ', torch.cuda.device_count())
and check if you see
count: 2
reloading
count: 1
If so, add app.run(... , use_reloader=False)
Upvotes: 1
Reputation: 1219
You say:
print('count: ', torch.cuda.device_count()) # prints count: 2
But the error says:
torch.cuda.device_count() is 1
Could you confirm that your run the two in the same worker?
edit: According to the message I had when trying to assign in wrong GPU, it could be due to asynchronous process calls. You may debug with os.environ['CUDA_LAUNCH_BLOCKING']='1'
.
Upvotes: 1