Reputation: 135
I am using A100-SXM4-40GB Gpu
but training is terribly slow. I tried two models, a simple classification on cifar and a Unet on Cityscapes. I tried my code on other GPUs and it worked totally fine, but I do not know why training on this high capacity GPU is super slow.
I would appreciate any help.
Here are some other properties of GPUs.
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
Upvotes: 0
Views: 3539
Reputation: 135
Thank you for your answer. Before trying your answer, I decided to uninstall anaconda and reinstall it and this solved the problem.
Upvotes: 1
Reputation: 581
Call .cuda()
on the model during initialization.
As per your above comments, you have GPUs, as well as CUDA installed, so there's no point of checking the device availability with torch.cuda.is_available()
.
Additionally, you should wrap your model in nn.DataParallel
to allow PyTorch use every GPU you expose it to. You also could do DistributedDataParallel
, but DataParallel
is easier to grasp initially.
Example initialization:
model = UNet().cuda()
model = torch.nn.DataParallel(model)
Also, you can be sure you're exposing the code to all GPUs by executing the python script with the following flag:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_unet.py
Last thing to note - nn.DataParallel
encapsulates the model itself, so for saving the state_dict, you'll need to reach module inside DataParallel:
torch.save(model.module.state_dict(), 'unet.pth')
Upvotes: 0