LeNet5 inference based on quantize TFLite model. How to downscale int32 to int8 with the M parameter?

Question

I trained a LeNet5 CNN with Keras/TensorFlow. I used TensorFlow Lite to quantized FP32 weights and activations to INT8. I extracted and visualized weights, biases, scales en zero-points thanks to Netron.

I needed to design LeNet5 CNN in C langage. In FP32 format, my model inference works fine. However, I don't understand some point to do the inference with the INT8 format.

From the paper : "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference" : https://arxiv.org/abs/1712.05877 , authors detailed the following workflow to do an inference with a quantize INT8 convolution : convolution INT8 inference After the convolution step between the input and the weights, then adding the bias, authors precise that the downscale from INT32 to INT8 is done thanks to the M constant define by M = (S1*S2)/S3.

S1 is the scale of the input. And for me, S2 is the scale of the weights and S3 is the scale of the output.

However in Netron, for the conv2d weights I can not view the scale involved : conv2d filters in Netron

But I can view the scale for Fully Connected operations : dense weights in Netron

My question is : what is S2 for the convolution ? Is it a problem of Netron visualization or do I misunderstood the M downsampling factor for the convolution process ?

Thank you for your help

LeNet5 inference based on quantize TFLite model. How to downscale int32 to int8 with the M parameter?

Answers (1)

Related Questions