Masoumeh Javanbakht
Masoumeh Javanbakht

Reputation: 135

Training a model on GPU is very slow

I am using A100-SXM4-40GB Gpu but training is terribly slow. I tried two models, a simple classification on cifar and a Unet on Cityscapes. I tried my code on other GPUs and it worked totally fine, but I do not know why training on this high capacity GPU is super slow.

I would appreciate any help.

Here are some other properties of GPUs.

GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB

Upvotes: 0

Views: 3539

Answers (2)

Masoumeh Javanbakht
Masoumeh Javanbakht

Reputation: 135

Thank you for your answer. Before trying your answer, I decided to uninstall anaconda and reinstall it and this solved the problem.

Upvotes: 1

deepconsc
deepconsc

Reputation: 581

Call .cuda() on the model during initialization.

As per your above comments, you have GPUs, as well as CUDA installed, so there's no point of checking the device availability with torch.cuda.is_available().

Additionally, you should wrap your model in nn.DataParallel to allow PyTorch use every GPU you expose it to. You also could do DistributedDataParallel, but DataParallel is easier to grasp initially.

Example initialization:

model = UNet().cuda()
model = torch.nn.DataParallel(model)

Also, you can be sure you're exposing the code to all GPUs by executing the python script with the following flag:

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_unet.py

Last thing to note - nn.DataParallel encapsulates the model itself, so for saving the state_dict, you'll need to reach module inside DataParallel:

torch.save(model.module.state_dict(), 'unet.pth')

Upvotes: 0

Related Questions