Reputation: 139
I have been struggling for day to make torch work on WSL2 using an RTX 3080.
I Installed the CUDA-toolkit version 11.3
Running nvcc -V
returns this :
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
nvidia-smi
returns this
Mon Nov 29 00:38:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 52C P5 17W / N/A | 1082MiB / 16384MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I verified the installation of the toolkit with blackscholes
./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 0.242822 msec
Effective memory bandwidth: 329.459087 GB/s
Gigaoptions per second : 32.945909
BlackScholes, Throughput = 32.9459 GOptions/s, Time = 0.00024 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Test passed
And when I try to use torch, it doesn't find any GPU. Btw, I had to install torch==1.10.0+cu113 if I wanted to use torch with my RTX 3080 as the sm_ with the simple 1.10.0 version are not compatible with the rtx3080.
Running torch returns this :
>>> import torch
>>> torch.version
<module 'torch.version' from '/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/version.py'>
>>> torch.version.cuda
'11.3'
>>> torch.cuda.get_arch_list()
[]
>>> torch.cuda.device_count()
0
>>> torch.cuda.current_device()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 479, in current_device
_lazy_init()
File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
At last, interestingly, I am completely able to run tensorflow-gpu on the same machine.
Installed pytorch like this : conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Also, I managed to run pytorch in a docker container started from my WSL2 machine with this command :
sudo docker run --gpus all -it --rm -v /home/...:/home/... nvcr.io/nvidia/pytorch:21.11-py3.
When running pytorch on the windows machine I am running the WSL from, it works too. Both return ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] which says that the library is compatible with rtx 3080.
Upvotes: 12
Views: 10001
Reputation: 624
In my case I used docker image nvidia/cuda:12.3.2-devel-ubuntu22.04
and torch.cuda.is_available()
gave False
, after switch to ubuntu:22.04
that problem gone.
As it said here https://docs.nvidia.com/cuda/wsl-user-guide/index.html under WSL is not recommended to have preinstalled GPU Linux driver.
Upvotes: 0
Reputation: 11
Note that you need to use cmd.exe not powershell because mklink is part of cmd.exe not an actual program
Upvotes: 0
Reputation: 1
Short: install PyTorch with cuda 11.1 or lower
Long: Unfortunately I cannot explain why this is happening but after experimenting with different distro versions (ubuntu and debian) and PyTorch versions (pip and conda), it seems that cuda 11.3 which is the only 11.x cuda shipped with pytorch on conda, does not work (cuda 10.2 works just fine).
Solution: have to install it using pip given the version you desire from official previous pytorch version page.
At the time of writing, highest PyTorch version with highest cuda on WSL2 can be installed using following command:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/torch_stable.html
Upvotes: 0
Reputation: 114
I got the same warning as @Homer Simpson when I ran the command sudo ldconfig
.
Dealt with it the same way that @Homer Simpson posted. In essence, what you need to do is delete libcuda.so
and libcuda.so.1
and recreate them again but this time, making symbolic links to libcuda.so.1.1
# Run CMD in Windows (as Administrator)
C:
cd \Windows\System32\lxss\lib
del libcuda.so
del libcuda.so.1
mklink libcuda.so libcuda.so.1.1
mklink libcuda.so.1 libcuda.so.1.1
# Open WSL bash
wsl -e /bin/bash
sudo ldconfig
Ref: https://github.com/microsoft/WSL/issues/5548#issuecomment-912495487
Upvotes: 2
Reputation: 41
In my case, I solved this issue by linking /usr/lib/wsl/lib/libcuda.so.1
to the libcuda.so
in your wsl2 CUDA location. See https://github.com/microsoft/WSL/issues/5663
After reboot, pytorch can find the GPU.
(I found the warning " /usr/lib/wsl/lib/libcuda.so.1
is not a symbolic link" during the apt-get upgrade command. Not sure you can solve it in the same way) Downgrade to pytorch 1.8.2LTS can also solve the problem, but the calculation speed is extremely low.
Upvotes: 4
Reputation: 520
I've met the same one, solved by downgrade pytorch from 1.10 to 1.8.2LTS
Upvotes: 3