oriz_1st
oriz_1st

Reputation: 11

Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. when using google colab

I try following https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#training-the-model in google colab

everything goes smoothly, built pycocotools, doing setup using object_detection/packages/tf2/setup.py , and testing using object_detection/builders/model_builder_tf2_test.py, create tfrecord, everything run smoothly without any problem

but when training is started it always fails

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

the full error is like this

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
  File "model_main_tf2.py", line 115, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main_tf2.py", line 112, in main
    record_summaries=FLAGS.record_summaries)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 603, in train_loop
    train_input, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 394, in load_fine_tune_checkpoint
    _ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 176, in _ensure_model_is_built
    labels,
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1286, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2849, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_strategy.py", line 671, in _call_for_each_replica
    self._container_strategy(), fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 86, in call_for_each_replica
    return wrapped(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3040, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
     [[Loss/RPNLoss/BalancedPositiveNegativeSampler_1/Cast_8/_588]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_44910]

I've been trying to use a lower version of TensorFlow like 2.4.0 but the problem is still there

Upvotes: 1

Views: 2021

Answers (2)

Jeroen
Jeroen

Reputation: 42

This may not work always and may not be safe, but as a workaround for the time being, you may try running !apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2, see their Github

Upvotes: 0

Szmate
Szmate

Reputation: 21

I was dealing with same problem, you need to check versions here: https://www.tensorflow.org/install/source#gpu Tensorflow Object detection uses Tensorflow 2.6.0, so you need to have cuDNN with 8.1, however Colab runtime uses 8.0.5. I solved that going here: https://developer.nvidia.com/cudnn registered and downloaded

cudnn-11.2-linux-x64-v8.1.0.77.tgz

Later I uploaded it to my drive and run colab with my drive mounted. In colab notebook that uses object_detection I placed in first cell:

!tar -zvxf /content/drive/MyDrive/task/cudnn-11.2-linux-x64-v8.1.0.77.tgz

and later

%%bash
cd cuda/include
sudo cp *.h /usr/local/cuda/include/

That solved my problem.

Upvotes: 2

Related Questions