Reputation: 11
I am using the OpenVINO model optimizer framework to convert an ONNX model containing a single ConvInteger operation to OpenVINO IR format.
mo --input_model {onnx_model}
The ONNX ConvInteger operator has input and weight tensors with INT8/UINT8 precision, and an output tensor with INT32 precision - this output precision is the only supported precision.
When the model is converted to OpenVINO, the input and weight tensors are converted to INT32 precision automatically, and convert operators are added to the model to make this change in precision.
Is it possible to force the int8/uint8 precision for the openvino model? Alternatively, is there a simple way to convert the precisions to int8/uint8 once the openvino model has been created?
Thanks
Upvotes: 1
Views: 476
Reputation: 104
You can convert the FP32 or FP16 precision into INT8 without model retraining or fine-tuning by using OpenVINO Post-training Optimization Tool (POT). This tool supports the uniform integer quantization method.
There are two main quantization methods:
Default Quantization: a recommended method that provides fast and accurate results in most cases. It requires only an unannotated dataset for quantization.
Accuracy-aware Quantization: an advanced method that allows keeping accuracy at a predefined range at the cost of performance improvement in case when Default Quantization cannot guarantee it. The method requires annotated representative dataset and may require more time for quantization.
Upvotes: 0