Ryan Park
Ryan Park

Reputation: 65

Python TensorRT: CUDNN_STATUS_MAPPING_ERROR Error

When running a facial recognition algorithm using TensorRT's Python API (as well as PyCUDA), I encounter the below error:

[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

[TensorRT] ERROR: FAILED_EXECUTION: std::exception

The code still compiles and runs, but the results are inaccurate-- the output of the program fluctuates from 0.999999907141816 to 0 when a more continuous range of numbers is expected. I've tested this with TF-TRT and Keras and my code works in both (with small changes to fit the differences between TF and Keras APIs).

I've tried installing different versions of CUDA (9.0, 10.0, and 10.1) and CuDNN (7.6.3, 7.6.5). TensorRT version is 6.0.1.5, and PyCUDA is 2019.1.2. If it helps, I'm running this on Ubuntu 18.04.

Any help would be appreciated!

Update: I think the error is caused by running a TensorFlow session at the same time. Specifically, I'm using a mtcnn package (link) that might interfere with TensorRT. When mtcnn initializes a TF session, the above error occurs; when mtcnn is not used, this error does not occur and everything runs as expected.

Upvotes: 0

Views: 3492

Answers (1)

Ryan Park
Ryan Park

Reputation: 65

Fixed it-- seems like the error was caused by GPU memory conflicts between TensorFlow and TensorRT. Because I'm using both simultaneously in the same program, I assume there were conflicts between how they both handled GPU allocation. The solution was to enter a TensorFlow session with allow_growth=True, before allocating buffers and creating an asynchronous stream using Pycuda and TensorRT.

  1. Enter a TensorFlow session (must happen prior to step 2):
tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))).__enter__()
  1. Allocate buffers (after step 1, from https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#serial_model_python):
h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
# Allocate device memory for inputs and outputs.
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
# Create a stream in which to copy inputs/outputs and run inference.
stream = cuda.Stream()
  1. Inference (see https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#perform_inference_python)

Upvotes: 1

Related Questions