Why does my CUDA work for Pytorch but not for Tensorflow suddenly?

Question

The machine I'm using is with Titan XP and running Ubuntu 18.10. I'm not the owner so I'm not sure how it was configured previously. The cuda version is 9.*, most likely 9.0. There is no folder like /usr/local/cuda. Though it sounds strange (because no Cuda is compatible with 18.10), previously it worked pretty well both for Tensorflow and Pytorch. Now, when running tensorflow-gpu v1.12.0 in python 2.7, cudatoolkit 9.2 and cudnn 7.2.1 (this worked well previously without any change), it reports:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file of directory

But, when I change my conda env to python 3.6 with pytorch 0.4.1 , cudatoolkit 9.0 and cudnn 7.6 (they are shown in pycharm). There is:

torch.cuda.is_available() # True

This shows that GPU is running in Pytorch code. Also I've checked GPU RAM by nvidia-smi, when Pytorch is running, RAM is occupied.

Although there is no Cuda folder like /usr/local/cuda/, when I run:

nvcc - V

There is:

Cuda compilation tools, release 9.1, V9.1.85

Can someone give me a hint about how these strange things happen? What should I do to make my tensorflow-gpu works? I get totally confused orz.

Michael · Accepted Answer

Anaconda environments install their own version of the CUDA toolkit when you install things like pytorch and tensorflow-gpu with conda. That looks like it's how your Python 3.6 environment was set up. Is your 2.7 version of Python a system install or part of another Python environment? It's possible that your Tensorflow was built against a CUDA toolkit that is no longer installed, for whatever reason, or in any case that you were trying to use Tensorflow while not having the path to the libraries that it was built against in your LD_LIBRARY_PATH (perhaps because of an unusual install location)

You can type which nvcc to see which part of your PATH is currently pointing to that executable. That will tell you where your CUDA toolkit is installed. I'm guessing that your PATH was still pointing to a conda environment when you last ran nvcc, or to some version of the CUDA toolkit in an unusual install location in any case.

First, I'd suggest abandoning any effort to use your system python with Tensorflow. My suggestion is to either modify or create a new conda environment and install tensorflow-gpu with conda, which will also install the CUDA toolkit for that environment. Note that your CUDA install will not be in /usr/local/cuda if you go down this path, it'll be located inside your conda environment instead.

Why does my CUDA work for Pytorch but not for Tensorflow suddenly?

Answers (1)

Related Questions