alwi
alwi

Reputation: 429

How to setup Tensorflow for RTX 3070 on Windows?

Im using Windows 10 and try to setup tesnsorflow scripts to work with my new RTX 3070 GPU. Previously I had it working on GTX 980.

Current behavior

Getting following error:

2021-01-25 21:36:01.042433: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/500
2021-01-25 21:36:03.304809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-25 21:36:03.880223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-25 21:36:03.911531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-25 21:36:04.515409: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-25 21:36:04.515498: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-25 21:36:04.515607: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 110, in <module>
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 560, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:    Fail to find the dnn implementation.
     [[{{node CudnnRNN}}]]
     [[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_8782]
Function call stack:
train_function -> train_function -> train_function

tf_2.4.1_issue_on_3070.txt

Tried also with latest nightly 2.4.02.5.0.dev20210125 ending up with error:

2021-01-25 21:31:05.429799: E tensorflow/stream_executor/dnn.cc:618] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1975): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'
2021-01-25 21:31:05.430291: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1926 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Users\Aleksander\.IntelliJIdea2018.3\config\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Workspace_GpwScan/dnn/sandbox/reproduce_issue.py", line 108, in <module>
    callbacks=[checkpoint, tensorboard])
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\keras\engine\training.py", line 1134, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 818, in __call__
    result = self._call(*args, **kwds)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\def_function.py", line 846, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 2994, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 1939, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\function.py", line 569, in call
    ctx=ctx)
  File "C:\Workspace_GpwScan\stubs\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError:    Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1, 128, 1, 128, 256, 128] 
     [[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
     [[Adam/gradients/PartitionedCall_2]] [Op:__inference_train_function_8936]
Function call stack:
train_function -> train_function -> train_function

tf_nightly_issue_on_3070.txt

Standalone code to reproduce the issue

import datetime
import os

import pandas as pd
from numpy import reshape

import tensorflow as tf

EPOCHS = 500
BATCH_SIZE = 256
TEST_SET_RATIO = 0.2

LEARNING_RATE = 0.001
DECAY = 3e-5
LOSS_FUNC = 'categorical_crossentropy'
DROPOUT = 0.2
OUTPUT_PATH = "e:\\ml"

RNN_SEQ_LEN = 128  # number of RNN/LSTM sequence features
L_AMOUNT = 2  # number of labels

MIN_ACC_TO_SAVE_MODEL = 0.6


def create_model():
    new_model = tf.keras.models.Sequential()

    # NETWORK INPUT
    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, input_shape=TR_FEATURES.shape[1:], return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN, return_sequences=True))
    new_model.add(tf.keras.layers.Dropout(DROPOUT / 2))
    new_model.add(tf.keras.layers.BatchNormalization())

    new_model.add(tf.keras.layers.LSTM(RNN_SEQ_LEN))
    new_model.add(tf.keras.layers.Dropout(DROPOUT))
    new_model.add(tf.keras.layers.BatchNormalization())

    # NETWORK OUTPUT
    new_model.add(tf.keras.layers.Dense(L_AMOUNT, activation=tf.keras.activations.softmax))

    opt = tf.keras.optimizers.Adam(LEARNING_RATE, decay=DECAY)
    new_model.compile(optimizer=opt,
                      loss=LOSS_FUNC,
                      metrics=['accuracy'])

    print(new_model.summary())
    return new_model


class CustomModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
    def __init__(self, fp, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', **kwargs):
        super().__init__(fp, monitor, verbose, save_best_only, save_weights_only, mode, save_freq, **kwargs)

    def on_epoch_end(self, epoch, logs=None):
        print("\n-------------------------------------------------------------------------------------------------------")
        print(f"epoch: {epoch}, training_acc: {round(float(logs['accuracy']), 4)}, validation_acc: {round(float(logs['val_accuracy']), 4)}")
        print("-------------------------------------------------------------------------------------------------------\n")

        if MIN_ACC_TO_SAVE_MODEL <= logs['accuracy']:
            super().on_epoch_end(epoch, logs)


if __name__ == '__main__':
    data_filename = 'train_2020-02-07_pp_x128_3_2_all.csv'
    print("Loading data file: %s" % data_filename)
    dataset = pd.read_csv(data_filename, delimiter=',', header=None)
    dataset = dataset.drop(columns=[0, 1, 2, 3, 4, 5, 6]).values  # drop columns with additional information

    test_set_size = int(len(dataset) * TEST_SET_RATIO)
    print("Test set split at: %d" % test_set_size)

    train_data = dataset[:-test_set_size]
    test_data = dataset[-test_set_size:]  # use most recent data for validation (extract before shuffle)

    TR_F = train_data[:, 0:RNN_SEQ_LEN]
    TS_F = test_data[:, 0:RNN_SEQ_LEN]

    TR_L = train_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]
    TS_L = test_data[:, RNN_SEQ_LEN:RNN_SEQ_LEN + L_AMOUNT]

    TR_FEATURES = reshape(TR_F, (len(TR_F), RNN_SEQ_LEN, 1))
    TS_FEATURES = reshape(TS_F, (len(TS_F), RNN_SEQ_LEN, 1))

    model = create_model()

    TRAINING_TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    model_name = "sscce_%s" % TRAINING_TIMESTAMP
    os.mkdir("%s\\models\\%s" % (OUTPUT_PATH, model_name))
    filepath = "%s\\models\\%s\\%s--{epoch:02d}-{val_accuracy:.3f}.model" % (OUTPUT_PATH, model_name, model_name)
    checkpoint = CustomModelCheckpoint(filepath,
                                       monitor='val_accuracy',
                                       verbose=1,
                                       save_best_only=True,
                                       mode='max')

    log_dir = "%s\\logs\\fit\\%s.model" % (OUTPUT_PATH, model_name)
    tensorboard = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=0)

    model.fit(x=TR_FEATURES,
              y=TR_L,
              epochs=EPOCHS,
              batch_size=BATCH_SIZE,
              shuffle=True,
              validation_data=(TS_FEATURES, TS_L),
              callbacks=[checkpoint, tensorboard])

DATA FILE SAMPLE: input_data.zip

Other info / logs

Providing also path to CUDA 11.0 installation because without it getting errors like:

2021-01-25 21:44:15.989317: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

Full win sys PATH:

Path=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin;C:\cudnn-11.1-v8.0.5.39\bin;C:\Python36\Scripts\;C:\Python36;C:\ProgramData\DockerDesktop\version-bin;C:\Program Files\Docker\Docker\Resources\bin;c:\Java\jdk1.8.0_144_x86;C:\gradle-6.0.1\bin;C:\SVN\bin;C:\MinGW\bin;C:\WinAVR-20100110\;c:\avrdude\;c:\Android\sdk\platform-tools;C:\adb\;C:\TortoiseGit\bin;C:\Git4Windows\cmd;c:\sqlite-tools-win32-x86-3130000\;C:\WINDOWS\System32;C:\WINDOWS;C:\WINDOWS\System32\wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Bitvise SSH Client;C:\Program Files (x86)\Windows Live\Shared;C:\WINDOWS\system32;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\OpenSSH\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.3.0\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR

I was trying different combinations of cuda/cudnn/tensorflow just for the sake of it but actually only cuda_11.2.0_460.89_win10 comes with win nvidia GPU divers version high enough to support RTX 30xx series. Still - there is no cudnn build designated particularly for CUDA 11.2 yet... Maybe this is an issue...

Any idea how to make it working all together?

Upvotes: 0

Views: 5122

Answers (2)

NikoNyrh
NikoNyrh

Reputation: 4138

I had a similar problem, having previously used TF 2.4, CUDA 11.0 and CuDNN 8.0. I don't know why a simple network worked in this configuration, but more complex ones didn't. Apparenly my simpler networks didn't utilize CuDNN?

Anyway, everything works after upgrading to TF 2.5, CUDA 11.2 and CuDNN to 8.1. In future it is best to check compatible library versions from tensorflow.com.

Upvotes: 0

alwi
alwi

Reputation: 429

I've rolled back to CUDA 11.0 and the matching CUDNN 8.0.2 with tensorflow 2.4.1 just to double-check it and this combination

cudnn-11.0-windows-x64-v8.0.2.39.zip
cuda_11.0.2_451.48_win10.exe
latest stable tensorflow 2.4.1
updated nVidia GPU drivers to 461.40 as 451.48 packaged with above CUDA installer won't work with rtx 3070

... gives:

2021-02-04 19:36:59.700433: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-02-04 19:36:59.700523: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-02-04 19:36:59.700630: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1514 : Unknown: Fail to find the dnn implementation.

Eventually it started to work with latest cudnn-11.2-windows-x64-v8.1.0.77.zip released recently together with 2.5-nightly but only together with cuda_11.2.0_460.89_win10.exe obviously.

Upvotes: 1

Related Questions