sixtytrees
sixtytrees

Reputation: 1233

torch.cuda.device_count() returns 2, but torch.load(model_path, map_location='cuda:1') throws an error

I have two GPUs and when I run

import torch
print('count: ', torch.cuda.device_count())  # prints count: 2

However, my model throws an error

RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1

on the line

torch.load(model_path, map_location='cuda:1')

What could cause it and how to fix it?

This issue is somehow linked to my Flask, because the training itself works with torch.load(model_path, map_location='cuda:1')

Upvotes: 2

Views: 1128

Answers (2)

Анастасия 86
Анастасия 86

Reputation: 56

This is a known Flask-CUDA issue. Please run Flask with it with print('count: ', torch.cuda.device_count()) and check if you see

count: 2
reloading
count: 1

If so, add app.run(... , use_reloader=False)

Upvotes: 1

Valentin Goldité
Valentin Goldité

Reputation: 1219

You say:

print('count: ', torch.cuda.device_count())  # prints count: 2

But the error says:

torch.cuda.device_count() is 1

Could you confirm that your run the two in the same worker?

edit: According to the message I had when trying to assign in wrong GPU, it could be due to asynchronous process calls. You may debug with os.environ['CUDA_LAUNCH_BLOCKING']='1'.

Upvotes: 1

Related Questions