harold
harold

Reputation: 139

WSL2 Pytorch - RuntimeError: No CUDA GPUs are available with RTX3080

I have been struggling for day to make torch work on WSL2 using an RTX 3080.

I Installed the CUDA-toolkit version 11.3

Running nvcc -V returns this :

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

nvidia-smi returns this

Mon Nov 29 00:38:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P5    17W /  N/A |   1082MiB / 16384MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I verified the installation of the toolkit with blackscholes

./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.242822 msec
Effective memory bandwidth: 329.459087 GB/s
Gigaoptions per second    : 32.945909

BlackScholes, Throughput = 32.9459 GOptions/s, Time = 0.00024 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

And when I try to use torch, it doesn't find any GPU. Btw, I had to install torch==1.10.0+cu113 if I wanted to use torch with my RTX 3080 as the sm_ with the simple 1.10.0 version are not compatible with the rtx3080.

Running torch returns this :

>>> import torch
>>> torch.version
<module 'torch.version' from '/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/version.py'>
>>> torch.version.cuda
'11.3'
>>> torch.cuda.get_arch_list()
[]
>>> torch.cuda.device_count()
0
>>>  torch.cuda.current_device()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 479, in current_device
    _lazy_init()
  File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

At last, interestingly, I am completely able to run tensorflow-gpu on the same machine.

Installed pytorch like this : conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Also, I managed to run pytorch in a docker container started from my WSL2 machine with this command :

sudo docker run --gpus all -it --rm -v /home/...:/home/... nvcr.io/nvidia/pytorch:21.11-py3.  

When running pytorch on the windows machine I am running the WSL from, it works too. Both return ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] which says that the library is compatible with rtx 3080.

Upvotes: 12

Views: 10001

Answers (6)

KEMBL
KEMBL

Reputation: 624

In my case I used docker image nvidia/cuda:12.3.2-devel-ubuntu22.04 and torch.cuda.is_available() gave False, after switch to ubuntu:22.04 that problem gone.

As it said here https://docs.nvidia.com/cuda/wsl-user-guide/index.html under WSL is not recommended to have preinstalled GPU Linux driver.

Upvotes: 0

Cyb0rg
Cyb0rg

Reputation: 11

Note that you need to use cmd.exe not powershell because mklink is part of cmd.exe not an actual program

Upvotes: 0

NDNG
NDNG

Reputation: 1

Short: install PyTorch with cuda 11.1 or lower

Long: Unfortunately I cannot explain why this is happening but after experimenting with different distro versions (ubuntu and debian) and PyTorch versions (pip and conda), it seems that cuda 11.3 which is the only 11.x cuda shipped with pytorch on conda, does not work (cuda 10.2 works just fine).

Solution: have to install it using pip given the version you desire from official previous pytorch version page.

At the time of writing, highest PyTorch version with highest cuda on WSL2 can be installed using following command:

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Upvotes: 0

Scotrraaj Gopal
Scotrraaj Gopal

Reputation: 114

I got the same warning as @Homer Simpson when I ran the command sudo ldconfig.

Dealt with it the same way that @Homer Simpson posted. In essence, what you need to do is delete libcuda.so and libcuda.so.1 and recreate them again but this time, making symbolic links to libcuda.so.1.1

# Run CMD in Windows (as Administrator)
C:
cd \Windows\System32\lxss\lib
del libcuda.so
del libcuda.so.1
mklink libcuda.so libcuda.so.1.1
mklink libcuda.so.1 libcuda.so.1.1

# Open WSL bash
wsl -e /bin/bash
sudo ldconfig

Ref: https://github.com/microsoft/WSL/issues/5548#issuecomment-912495487

Upvotes: 2

Homer Simpson
Homer Simpson

Reputation: 41

In my case, I solved this issue by linking /usr/lib/wsl/lib/libcuda.so.1 to the libcuda.so in your wsl2 CUDA location. See https://github.com/microsoft/WSL/issues/5663 After reboot, pytorch can find the GPU.

(I found the warning " /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link" during the apt-get upgrade command. Not sure you can solve it in the same way) Downgrade to pytorch 1.8.2LTS can also solve the problem, but the calculation speed is extremely low.

Upvotes: 4

skankhunt76
skankhunt76

Reputation: 520

I've met the same one, solved by downgrade pytorch from 1.10 to 1.8.2LTS

Upvotes: 3

Related Questions