Reputation: 387
So, I have a machine with a RTX 2060, and I want to run tensorflow on it. However, the error, Failed to get convolution algorithm, is showing up despite me installing cudNN on it.
I have Tensorflow-GPU 1.13.1 running on my Linux (Xubuntu 18.04) machine. I have followed the instructions on the site (which are below), and have installed via pip tensorflow-gpu.
Instructions I've followed:
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-0 \
libcudnn7=7.4.1.5-1+cuda10.0 \
libcudnn7-dev=7.4.1.5-1+cuda10.0
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo dpkg -i cuda-repo-ubunt
# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
&& sudo apt-get update \
&& sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0
Error I get:
2019-03-25 23:16:50.938950: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-25 23:16:52.732720: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-25 23:16:52.736377: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "start.py", line 54, in <module>
main()
File "start.py", line 51, in main
main_loop(agent, curiousity_engine)
File "start.py", line 23, in main_loop
action1 = agent.act(states=get_screen())
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 148, in act
independent=independent
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1393, in act
fetch_list = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "/home/user/.local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]
Caused by op 'ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D', defined at:
File "start.py", line 54, in <module>
main()
File "start.py", line 41, in main
agent, user_input = agent_build()
File "/home/user/Downloads/v2 (2)/agent.py", line 37, in agent_build
actions_exploration = 'epsilon_decay'
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 155, in __init__
entropy_regularization=entropy_regularization
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/learning_agent.py", line 141, in __init__
batching_capacity=batching_capacity
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 80, in __init__
self.model = self.initialize_model()
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 183, in initialize_model
likelihood_ratio_clipping=self.likelihood_ratio_clipping
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_prob_ratio_model.py", line 88, in __init__
gae_lambda=gae_lambda
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_model.py", line 98, in __init__
requires_deterministic=False
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 90, in __init__
discount=discount
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 114, in __init__
reward_preprocessing=reward_preprocessing
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 217, in __init__
self.setup()
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 290, in setup
independent=independent
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 605, in create_operations
independent=independent
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1193, in create_operations
independent=independent
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1019, in create_act_operations
deterministic=deterministic
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
return self._call_func(args, kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 187, in tf_actions_and_internals
return_internals=True
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
return self._call_func(args, kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/network.py", line 253, in tf_apply
x = layer.apply(x=x, update=update)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
return self._call_func(args, kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py", line 1079, in tf_apply
x = tf.nn.conv2d(input=x, filter=self.filters, strides=(1, stride_h, stride_w, 1), padding=self.padding)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]
Upvotes: 0
Views: 6188
Reputation: 759
Initialize your code with following code:
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
Check more details discussion here
Upvotes: 1
Reputation: 86
I've ran into the same problem(s) with the same setup. What I found was (if I recall correctly) that some of the later commands install a newer version of the driver. Matching the versions seems to be very critical. Also my mouse stopped working because some input package was de-installed.
The fiddling around cost me days and numerous clean installs... What worked in the end was installing the driver, cuda and cudnn manually. The process is far from optimal and my end result is not as neat as I would like to have it, but it works.
My versions: Driver: 410.48 Cuda: 10.0 cuDNN: 7.4.2 (TensorRt: pick one that uses cuDNN 7.4.2)
Besides that it was needed to add one of the following lines to the python tensorflow code:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)
or
config = tf.ConfigProto()
# config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.1
sess = tf.Session(config=config)
Upvotes: 2