Pedram
Pedram

Reputation: 93

Deeplab to TensorRT conversion

Converting Deeplab Tensorflow model to TensorRT model increases inference time dramatically, what am I doing wrong in my code?

Here I am doing the conversion from Tensorflow graph to TensorRT graph and saving this new TRT model:

OUTPUT_NAME = ["SemanticPredictions"]

# read Tensorflow frozen graph
with gfile.FastGFile('/frozen_inference_graph.pb', 'rb') as tf_model:
   tf_graphf = tensorflow.GraphDef()
   tf_graphf.ParseFromString(tf_model.read())

# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(input_graph_def=tf_graphf, outputs=OUTPUT_NAME, max_batch_size=2, max_workspace_size_bytes=2 * (10 ** 9), precision_mode="INT8")

# write the TensorRT model to be used later for inference
with gfile.FastGFile("TensorRT_model.pb", 'wb') as f:
   f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")

And in another script, I am loading this TRT model again and make semantic segmentation prediction with it but it is about 7 to 8 times slower! Here goes the second script:

with tensorflow.Session(config=tensorflow.ConfigProto(gpu_options=tensorflow.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess:
   img_array = cv2.imread('test.png',1)

   # read TensorRT frozen graph
   with gfile.FastGFile('TensorRT_model.pb', 'rb') as trt_model:
      trt_graph = tensorflow.GraphDef()
      trt_graph.ParseFromString(trt_model.read())

   # obtain the corresponding input-output tensor
   tensorflow.import_graph_def(trt_graph, name='')
   input = sess.graph.get_tensor_by_name('ImageTensor:0')
   output = sess.graph.get_tensor_by_name('SemanticPredictions:0')

   # perform inference
   batch_seg_map = sess.run(output, feed_dict={input: [img_array]})
   seg_map = batch_seg_map[0]
   seg_img = label_to_color_image(seg_map).astype(np.uint8)

Any ideas how should I perform the conversion properly in a way that speeds up the inference?

Upvotes: 0

Views: 1758

Answers (3)

aniket
aniket

Reputation: 101

I have converted my deeplabv3+ model into TensorRT optimized pb graph using TF-TRT developer guide. I am using Jetson Nano developer kit to run my models. With my experience, I think you need to check following things:

  1. Does your hardware (GPU) support INT8? In my case Jetson nano does not support INT8 (the graph was converted but the inference took longer) . During research I found out that GPU should have FP16/FP32 tensor cores to run the models as expected. Refer here

  2. Check your tensorflow model for unsupported ops for INT8/FP16/FP32 precision? For deeplabv3+ I get similar performance(time and IoU) in case of FP16 and FP32 optimized graphs. For INT8, the calibration fails. Refer here For checking the supported ops refer here

Upvotes: 0

iariav
iariav

Reputation: 53

from my experience with trying to convert a deeplab model using trt, the int8 mode is not performing well since there are many unsupported ops in this model so the graph gets "broken" into many small sub-graphs, and only a subset of them is being converted to trt. i was able to convert properly and speed up the inference somehow in fp16 mode.

p.s if you still wanna go with int8, you don't necessarily need calibration files, just some input images you can run your model on for calibration.

Upvotes: 2

Pooya Davoodi
Pooya Davoodi

Reputation: 147

Given that you set the precision mode to INT8, I think that you are running the calibration algorithm instead of inference. The calibration algorithm is much slower than inference because it collects stats and sets the quantization ranges.

After calling create_inference_graph, you would need to call calib_graph_to_infer_graph.

See this for an example: https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L500

Upvotes: 1

Related Questions