How to create an INT8 calibration table for the TensorRT execution provider of the ONNX runtime?

Question

I exported a torch model to ONNX and want to run it with the ONNX runtime on an NVidia Jetson SoC. This works well with different backends (CPU, CUDA, and TensorRT) and different precisions (FP32 and FP16). Now, however, I want to quantize the model to INT8 weights to see if this further improves the performance.

The TensorRT exeuction provider has three configuration options: trt_int8_enable, trt_int8_calibration_table_name, and trt_int8_use_native_calibration_table (see https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#configurations). I did a lot of research and found descriptions on how the process of INT8 quantization works in theory. But I haven't found a conclusive manual or example on how to create and save an INT8 calibration table for the TensorRT execution provider.

How can I create this table using the ONNX or TRT python APIs?

How to create an INT8 calibration table for the TensorRT execution provider of the ONNX runtime?

Answers (0)

Related Questions