Reputation: 3393
I've added an GeForce GTX 1080 Ti into my machine (Running Ubuntu 18.04 and Anaconda with Python 3.7) to utilize the GPU when using PyTorch. Both cards a correctly identified:
$ lspci | grep VGA
03:00.0 VGA compatible controller: NVIDIA Corporation GF119 [NVS 310] (reva1)
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
The NVS 310 handles my 2-monitor setup, I only want to utilize the 1080 for PyTorch. I also installed the latest NVIDIA drivers that are currently in the repository and that seems to be fine:
$ nvidia-smi
Sat Jan 19 12:42:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVS 310 Off | 00000000:03:00.0 N/A | N/A |
| 30% 60C P0 N/A / N/A | 461MiB / 963MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 0% 41C P8 10W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Driver version 390.xx allows to run CUDA 9.1 (9.1.85) according the the NVIDIA docs. Since this is also the version in the Ubuntu repositories, I simple installed the CUDA Toolkit with:
$ sudo apt-get-installed nvidia-cuda-toolkit
And again, this seems be alright:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
and
$ apt-cache policy nvidia-cuda-toolkit
nvidia-cuda-toolkit:
Installed: 9.1.85-3ubuntu1
Candidate: 9.1.85-3ubuntu1
Version table:
*** 9.1.85-3ubuntu1 500
500 http://sg.archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages
100 /var/lib/dpkg/status
Lastly, I've installed PyTorch from scratch with conda
conda install pytorch torchvision -c pytorch
Also error as far as I can tell:
$ conda list
...
pytorch 1.0.0 py3.7_cuda9.0.176_cudnn7.4.1_1 pytorch
...
However, PyTorch doesn't seem to find CUDA:
$ python -c 'import torch; print(torch.cuda.is_available())'
False
In more detail, if I force PyTorch to convert a tensor x
to CUDA with x.cuda()
I get the error:
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from 82 http://...
What am I'm missing here? I'm new to this, but I think I've checked the Web already quite a bit to find any caveats like NVIDIA driver and CUDA toolkit versions?
EDIT: Some more outputs from PyTorch:
print(torch.cuda.device_count()) # --> 0
print(torch.cuda.is_available()) # --> False
print(torch.version.cuda) # --> 9.0.176
Upvotes: 18
Views: 56228
Reputation: 7165
In my csae, ubuntu wsl
is used, wsl version
influence, see https://github.com/pytorch/pytorch/issues/73487
Upvotes: 0
Reputation: 3
As mentioned before you will need to set your CUDA_VISIBLE_DEVICES.
If you want to use 1 GPU it would be:
CUDA_VISIBLE_DEVICES=1
You can find more details if you want to have a more complex setup in the following link: How do I select which GPU to run a job on?
Upvotes: 0
Reputation: 1
You can load the data and the model to a GPU. You can create dataloaders and load them into your local system if it has GPU support, or you can use it, for example, online on kaggle or colab server as well. You can change the batch_size, num_workers, etc depending on your system if running it locally.
from torch.utils.data import DataLoader
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
Upvotes: 0
Reputation: 3104
I have had the same issue when trying to use PyTorch to train in our server (has 4 GPUs), so I didn't have the option of just removing the GPUs.
However, I am using docker and docker-compose to run my training. Thus I found this pytorch image from nvidia that comes with all the necessary setup. Please before you pull the image, make sure to check this page to determine which image tag is compatible with your nvidia driver version (if you pull the wrong one, it won't work).
Then, in your docker-compose file, you can specify which GPUs to use as follow:
version: '3.5'
services:
training:
build:
context: ""
dockerfile: Dockerfile
container_name: training
environment:
- CUDA_VISIBLE_DEVICES=0,2
ipc: "host"
Make sure to set ipc to "host", which will allow your docker container to use the host shared memory and not the one allocated to docker (insufficient).
Upvotes: 1
Reputation: 46291
Since you had two graphic cards, selecting a card ID CUDA_VISIBLE_DEVICES=GPU_ID
should fix the problem as per this explanation.
Upvotes: 1