ZeroMaxinumXZ
ZeroMaxinumXZ

Reputation: 387

I've installed cudNN but error "Failed to get convolution algorithm' shows up

So, I have a machine with a RTX 2060, and I want to run tensorflow on it. However, the error, Failed to get convolution algorithm, is showing up despite me installing cudNN on it.

I have Tensorflow-GPU 1.13.1 running on my Linux (Xubuntu 18.04) machine. I have followed the instructions on the site (which are below), and have installed via pip tensorflow-gpu.

Instructions I've followed:


# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.4.1.5-1+cuda10.0  \
    libcudnn7-dev=7.4.1.5-1+cuda10.0
 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo dpkg -i cuda-repo-ubunt

# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
        sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
        && sudo apt-get update \
        && sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0

Error I get:

2019-03-25 23:16:50.938950: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-25 23:16:52.732720: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-25 23:16:52.736377: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "start.py", line 54, in <module>
    main()
  File "start.py", line 51, in main
    main_loop(agent, curiousity_engine)
  File "start.py", line 23, in main_loop
    action1 = agent.act(states=get_screen())
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 148, in act
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1393, in act
    fetch_list = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
    run_metadata=run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
    raise six.reraise(*original_exc_info)
  File "/home/user/.local/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
    run_metadata=run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]

Caused by op 'ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D', defined at:
  File "start.py", line 54, in <module>
    main()
  File "start.py", line 41, in main
    agent, user_input = agent_build()
  File "/home/user/Downloads/v2 (2)/agent.py", line 37, in agent_build
    actions_exploration = 'epsilon_decay'
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 155, in __init__
    entropy_regularization=entropy_regularization
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/learning_agent.py", line 141, in __init__
    batching_capacity=batching_capacity
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/agent.py", line 80, in __init__
    self.model = self.initialize_model()
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/agents/ppo_agent.py", line 183, in initialize_model
    likelihood_ratio_clipping=self.likelihood_ratio_clipping
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_prob_ratio_model.py", line 88, in __init__
    gae_lambda=gae_lambda
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/pg_model.py", line 98, in __init__
    requires_deterministic=False
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 90, in __init__
    discount=discount
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 114, in __init__
    reward_preprocessing=reward_preprocessing
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 217, in __init__
    self.setup()
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 290, in setup
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/memory_model.py", line 605, in create_operations
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1193, in create_operations
    independent=independent
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/model.py", line 1019, in create_act_operations
    deterministic=deterministic
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/models/distribution_model.py", line 187, in tf_actions_and_internals
    return_internals=True
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/network.py", line 253, in tf_apply
    x = layer.apply(x=x, update=update)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 368, in __call__
    return self._call_func(args, kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py", line 1079, in tf_apply
    x = tf.nn.conv2d(input=x, filter=self.filters, strides=(1, stride_h, stride_w, 1), padding=self.padding)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ppo/actions-and-internals/layered-network/apply/conv2d0/apply/Conv2D (defined at /home/user/.local/lib/python3.6/site-packages/tensorforce/core/networks/layer.py:1079) ]]

Upvotes: 0

Views: 6188

Answers (2)

user1410665
user1410665

Reputation: 759

Initialize your code with following code:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

Check more details discussion here

Upvotes: 1

Martijn Pot
Martijn Pot

Reputation: 86

I've ran into the same problem(s) with the same setup. What I found was (if I recall correctly) that some of the later commands install a newer version of the driver. Matching the versions seems to be very critical. Also my mouse stopped working because some input package was de-installed.

The fiddling around cost me days and numerous clean installs... What worked in the end was installing the driver, cuda and cudnn manually. The process is far from optimal and my end result is not as neat as I would like to have it, but it works.

My versions: Driver: 410.48 Cuda: 10.0 cuDNN: 7.4.2 (TensorRt: pick one that uses cuDNN 7.4.2)

Besides that it was needed to add one of the following lines to the python tensorflow code:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)

or

config = tf.ConfigProto()
# config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.1
sess = tf.Session(config=config)

Upvotes: 2

Related Questions