Tensorflow Lite inference - how do I scale down the convolution layer outputs?

Question

I built a simple CNN model with one convolutional layer and converted it with Tensorflow Lite. (for MNIST!!) So now my model gets 8-bit integer inputs and weights are 8-bit integers too.

I wanted to test the parameters I got from TFLite, so I wrote C code for the inference step.

Input image pixels were given the 8-bit integers between 0 and 255 and weights were between -128~127. (Biases were 32-bit integers.) The convolution results, of course, consisted of numbers bigger than 255.

I checked this paper(https://arxiv.org/pdf/1712.05877.pdf, "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference") and it had some tips for what to do to this convolution result. It said I had to (1) scale down, (2) cast down (to uint8), and (3) apply the activation function to generate 8-bit output.

To my understanding, I needed to multiply 2^(-n) to the convolution results. So I divided the convolution outputs in 256 and limited the max number to 255, and further calculated them with fully connected layer's weights.

It showed a good result(accuracy 0.96+), but it was not as high as TFLite evaluation said. (accuracy 0.98+)

I don't think I did it in the right way because "256"(that I divided the convolution outputs into) was a random number. And actually when I changed it to 340, it showed the best result, but still far less than TFLite evaluation with TFLite Interpreter.

What is the correct and sophisticated way to implement the inference step? How do I scale down?

Tensorflow Lite inference - how do I scale down the convolution layer outputs?

Answers (1)

Related Questions