How to setup Tensorflow for RTX 3070 on Windows?

Question

Im using Windows 10 and try to setup tesnsorflow scripts to work with my new RTX 3070 GPU. Previously I had it working on GTX 980.

TensorFlow installed from binary (pip3 install tensorflow)
tried latest stable v2.4.0-49-g85c8b2a817f 2.4.1 but also nightly (see below)
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
CUDA/cuDNN version: cuda_11.2.0_460.89_win10\cudnn-11.1-v8.0.5.39
GPU model and memory: seems to be recognized correctly by TF - GeForce RTX 3070 computeCapability: 8.6 coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s

Current behavior

Getting following error:

2021-01-25 21:36:01.042433: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/500
2021-01-25 21:36:03.304809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-25 21:36:03.880223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-25 21:36:03.911531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-25 21:36:04.515409: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-25 21:36:04.515498: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-25 21:36:04.515607: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"
", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 110, in 
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\keras\engine	raining.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 560, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:    Fail to find the dnn implementation.
     [[{{node CudnnRNN}}]]
     [[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_8782]
Function call stack:
train_function -> train_function -> train_function

tf_2.4.1_issue_on_3070.txt

Tried also with latest nightly 2.4.02.5.0.dev20210125 ending up with error:

2021-01-25 21:31:05.429799: E tensorflow/stream_executor/dnn.cc:618] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1975): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-01-25 21:31:05.430291: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1926 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"
", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 108, in 
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\keras\engine	raining.py", line 1134, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\def_function.py", line 818, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\def_function.py", line 846, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 2994, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 1939, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\function.py", line 569, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs	ensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError:    Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
     [[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
     [[Adam/gradients/PartitionedCall_2]] [Op:__inference_train_function_8936]
Function call stack:
train_function -> train_function -> train_function

tf_nightly_issue_on_3070.txt

Standalone code to reproduce the issue

import datetime
import os

import pandas as pd
from numpy import reshape

import tensorflow as tf

EPOCHS = 500
BATCH_SIZE = 256
TEST_SET_RATIO = 0.2

LEARNING_RATE = 0.001
DECAY = 3e-5
LOSS_FUNC = 'categorical_crossentropy'
DROPOUT = 0.2
OUTPUT_PATH = "e:\ml"

RNN_SEQ_LEN = 128  # number of RNN/LSTM sequence features
L_AMOUNT = 2  # number of labels

MIN_ACC_TO_SAVE_MODEL = 0.6


def create_model():
    new_model = tf.keras.models.Sequential()

    # NETWORK INPUT
    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, input_shape=TR_FEATURES.shape[1:], return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT / 2))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    # NETWORK OUTPUT
    new_model.add(tf.keras.layers.Dense(L_AMOUNT, activation=tf.keras.activations.softmax))

    opt = tf.keras.optimizers.Adam(LEARNING_RATE, decay=DECAY)
    new_model.compile(optimizer=opt,
                      loss=LOSS_FUNC,
                      metrics=['accuracy'])

    print(new_model.summary())
    return new_model


class CustomModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
    def __init__(self, fp, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', **kwargs):
        super().__init__(fp, monitor, verbose, save_best_only, save_weights_only, mode, save_freq, **kwargs)

    def on_epoch_end(self, epoch, logs=None):
        print("
-------------------------------------------------------------------------------------------------------")
        print(f"epoch: {epoch}, training_acc: {round(float(logs['accuracy']), 4)}, validation_acc: {round(float(logs['val_accuracy']), 4)}")
        print("-------------------------------------------------------------------------------------------------------
")

        if MIN_ACC_TO_SAVE_MODEL <= logs['accuracy']:
            super().on_epoch_end(epoch, logs)


if __name__ == '__main__':
    data_filename = 'train_2020-02-07_pp_x128_3_2_all.csv'
    print("Loading data file: %s" % data_filename)
    dataset = pd.read_csv(data_filename, delimiter=',', header=None)
    dataset = dataset.drop(columns=[0, 1, 2, 3, 4, 5, 6]).values  # drop columns with additional information

    test_set_size = int(len(dataset) * TEST_SET_RATIO)
    print("Test set split at: %d" % test_set_size)

    train_data = dataset[:-test_set_size]
    test_data = dataset[-test_set_size:]  # use most recent data for validation (extract before shuffle)

    TR_F = train_data[:, 0:RNN_SEQ_LEN]
    TS_F = test_data[:, 0:RNN_SEQ_LEN]

    TR_L = train_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
    TS_L = test_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]

    TR_FEATURES = reshape(TR_F, (len(TR_F), RNN_SEQ_LEN, 1))
    TS_FEATURES = reshape(TS_F, (len(TS_F), RNN_SEQ_LEN, 1))

    model = create_model()

    TRAINING_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    model_name = "sscce_%s" % TRAINING_TIMESTAMP
    os.mkdir("%s\models\%s" % (OUTPUT_PATH, model_name))
    filepath = "%s\models\%s\%s--{epoch:02d}-{val_accuracy:.3f}.model" % (OUTPUT_PATH, model_name, model_name)
    checkpoint = CustomModelCheckpoint(filepath,
                                       monitor='val_accuracy',
                                       verbose=1,
                                       save_best_only=True,
                                       mode='max')

    log_dir = "%s\logs\fit\%s.model" % (OUTPUT_PATH, model_name)
    tensorboard = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=0)

    model.fit(x=TR_FEATURES,
              y=TR_L,
              epochs=EPOCHS,
              batch_size=BATCH_SIZE,
              shuffle=True,
              validation_data=(TS_FEATURES, TS_L),
              callbacks=[checkpoint, tensorboard])

DATA FILE SAMPLE: input_data.zip

Other info / logs

Providing also path to CUDA 11.0 installation because without it getting errors like:

2021-01-25 21:44:15.989317: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

Full win sys PATH:

Path=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin;C:\cudnn-11.1-v8.0.5.39\bin;C:\Python36\Scripts\;C:\Python36;C:\ProgramData\DockerDesktop\version-bin;C:\Program Files\Docker\Docker\Resources\bin;c:\Java\jdk1.8.0_144_x86;C:\gradle-6.0.1\bin;C:\SVN\bin;C:\MinGW\bin;C:\WinAVR-20100110\;c:\avrdude\;c:\Android\sdk\platform-tools;C:\adb\;C:\TortoiseGit\bin;C:\Git4Windows\cmd;c:\sqlite-tools-win32-x86-3130000\;C:\WINDOWS\System32;C:\WINDOWS;C:\WINDOWS\System32\wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Bitvise SSH Client;C:\Program Files (x86)\Windows Live\Shared;C:\WINDOWS\system32;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\OpenSSH\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.3.0\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR

I was trying different combinations of cuda/cudnn/tensorflow just for the sake of it but actually only cuda_11.2.0_460.89_win10 comes with win nvidia GPU divers version high enough to support RTX 30xx series. Still - there is no cudnn build designated particularly for CUDA 11.2 yet... Maybe this is an issue...

Any idea how to make it working all together?

alwi · Accepted Answer

I've rolled back to CUDA 11.0 and the matching CUDNN 8.0.2 with tensorflow 2.4.1 just to double-check it and this combination

cudnn-11.0-windows-x64-v8.0.2.39.zip
cuda_11.0.2_451.48_win10.exe
latest stable tensorflow 2.4.1
updated nVidia GPU drivers to 461.40 as 451.48 packaged with above CUDA installer won't work with rtx 3070

... gives:

2021-02-04 19:36:59.700433: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-02-04 19:36:59.700523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-02-04 19:36:59.700630: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.

Eventually it started to work with latest cudnn-11.2-windows-x64-v8.1.0.77.zip released recently together with 2.5-nightly but only together with cuda_11.2.0_460.89_win10.exe obviously.

How to setup Tensorflow for RTX 3070 on Windows?

Answers (2)

Related Questions