Reputation: 3070
When trying to run some Pytorch code I get this error:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "demo.py", line 173, in test
pca = torch.FloatTensor( np.load('../basics/U_lrw1.npy')[:,:6]).cuda()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
I a running a cloud virtual machine using the 'Google Deep Learning VM' Version: tf-gpu.1-13.m25 Based on: Debian GNU/Linux 9.9 (stretch) (GNU/Linux 4.9.0-9-amd64 x86_64\n) Linux tf-gpu-interruptible 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64
Environment info:
$ nvidia-smi
Sun May 26 05:32:33 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 74W / 149W | 0MiB / 11441MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ echo $CUDA_PATH
$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
$ env | grep CUDA
CUDA_VISIBLE_DEVICES=0
$ pip freeze
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.
7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
audioread==2.1.7
backports.functools-lru-cache==1.5
certifi==2019.3.9
chardet==3.0.4
cloudpickle==1.1.1
cycler==0.10.0
dask==1.2.2
decorator==4.4.0
dlib==19.17.0
enum34==1.1.6
filelock==3.0.12
funcsigs==1.0.2
future==0.17.1
gdown==3.8.1
idna==2.8
joblib==0.13.2
kiwisolver==1.1.0
librosa==0.6.3
llvmlite==0.28.0
Upvotes: 4
Views: 8856
Reputation: 2337
I didn't get the main reason for your problem. But I noticed one thing, GPU-Util 100%, while there are no processes running behind.
You can try out in the following directions.
which enables in persistence mode. This might solve your problem. The combination of ECC with non persistence mode can lead to 100% Utilization of GPU.
You can also disable ECC with the command nvidia -smi -e 0
Or best will be restart once again the whole process from the starting i.e reboot the Operating System once again.
Note: I'm not sure whether it will work for you or not. I had faced similar issue earlier so I am just telling based on my experience. Hope this will help you.
Upvotes: 1