Reputation: 1028
I have previously had CUDA 9.x running on this Win 10 64-bit Home system (targeting 1080Ti card), but need to update to CUDA 10.0 for TensorFlow 2. I initially thought TF2 was OK with CUDA 10.1 and so first installed 10.1 and only later found out that it must be CUDA 10.
Can't get it to work...
To test TF, I ran this to validate the installation(Jupyter notebook via Anaconda - freshly built TF2 environment)
import tensforflow as tf
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
I get this error in the basic Python test
InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found
This suggests that a key file cannot be found, but I can't work out the root cause - and there are very few hits on that error info, none of which helped me.
Current Config
CUDA 10.0 installed Nvidia driver 436.48 game ready driver
Potential issues & resolution actions so far
Obviously none of them have fixed things
Known Oddities
[superseded by Update 2] nvidia-smi.exe reports CUDA 10.1 (yes, it is available on my Win 10) - but checking through the registry I can't find anything to suggest CUDA 10.1 is lingering there...Update Found it in C:\Windows\System32
Despite uninstalls, I still have CudaXYZWizardsPackage in the registry under the key Computer\HKEY_USERS.DEFAULT\Software\Microsoft\VisualStudio\14.0_Config\InstalledProducts with XYZ = 90, 91, 100, 101 - but I doubt this is the issue for TF in Python ;) Update there is nothing in C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\NVIDIA except for 10.0 so just orphan reg entries.
Other info
Questions
Upvotes: 4
Views: 13295
Reputation: 1028
This is mostly an extended comment, since @diego ask for updates...
I now CUDA 10.0 installed and the nVidia control panel reports nvcuda.dll as v 10.0.132
I have built the recommended demo devicequery.exe using Visual Studio 2017 from the vs solution in C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.0\1_Utilities\deviceQuery (note that the .exe ends up in C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.0\bin\win64\Debug)
The program then ran from a cmd prompt and gave the following output.
devicequery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 10.0 / 10.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11264 MBytes (11811160064 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate:
1607 MHz (1.61 GHz) Memory Clock rate:
5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size:
32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch:
2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support:
Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1 Result = PASS
What did I do to achieve this? Hard to be specific because I didn't realise I had succeeded, but I recall setting the display driver to VGA, rebooting (twice for safety) then uninstalling CUDA 10.0, rebooting then installing 10.0.
I did just notice that I built deviceQuery with a vs 2012 solution, but I did agree to VS updating on solution open.
Upvotes: 1
Reputation: 812
download and install Anaconda (Python 3.7): https://www.anaconda.com/distribution/
in Command Prompt:
conda update conda conda update python conda create --name tensorflow-gpu conda activate tensorflow-gpu conda install pip jupyter pip install tensorflow-gpu conda install cudatoolkit=10.0 -c pytorch
Anaconda3 (64-bit) -> Jupyter Notebook (tensorflow-gpu)
import tensorflow as tf
%%time
with tf.device('/CPU:0'):
a = tf.random.uniform([1000,1000])
b = tf.random.uniform([1000,1000])
c = tf.matmul(a, b)
Wall time: 18.9 ms
%%time
with tf.device('/GPU:0'):
a = tf.random.uniform([1000,1000])
b = tf.random.uniform([1000,1000])
c = tf.matmul(a, b)
Wall time: 2.99 ms
Upvotes: 2