wickstopher
wickstopher

Reputation: 999

Wildly different quantization performance on tensorflow-lite conversion of keras-trained DenseNet models

I have two models that I have trained using Keras. The two models use the same architecture (the DenseNet169 implementation from keras_applications.densenet package), however they each have a different number of target classes (80 in one case, 200 in the other case).

I'm using an identical command-line to TOCO for both of the models:

toco --graph_def_file frozen_graph.pb \ --output_file quantized_graph.tflite \ --inference_type FLOAT \ --inference_input_type FLOAT \ --output_format TFLITE \ --input_arrays input_1 \ --output_arrays output_node0 \ --quantize True

My tensorflow version is 1.11.0 (installed via pip on macOS Mojave, although I have also tried the same command/environment on the Ubuntu machine I use for training with identical results).

I'm at a complete loss as to why the accuracy of inference is so drastically affected for one model and not the other. This holds true for many different trainings of the same two architecture/target class combinations. I feel like I must be missing something, but I'm baffled.

Upvotes: 3

Views: 695

Answers (1)

Chan Kha Vu
Chan Kha Vu

Reputation: 10400

This was intended to be just a small sneaky comment since i'm not sure if this can help, but then it got so long that I decided to make it an answer...


My wild guess is that the accuracy drop may be caused by the variance of the output of your network. After quantization (btw, tensorflow uses fixed-point quantization), you are playing with only 256 points (8 bit) instead of the full dense range of float32.

On most of the blogs around the web, it is stated that the main assumption of Quantization is that weights and activations tends to lie in a small range of values. However, there is an implicit assumption that is less talked about in blogs and literature: the activations of the network on a single sample should be decently spread across the quantized range.

Consider the following scenario where the assumption holds place (a histogram of activations on single sample at specific layer, and the vertical lines are quantization points): enter image description here

Now consider the scenario where the second assumption is not true, but the first assumption can still hold place (the blue is overall value distribution, gray is for given sample, vertical strips are quantization points):

enter image description here

In the first scenario, the distribution for the given sample is covered well (with a lot of quant points). In the second, only 2 quant points. The similar thing can happen to your network as well: maybe for 80 classes it still have enough quantization points to distinguish, but with 200 classes we might not have enough...

Hey, but it doesn't affect MobileNet with 1000 classes, and even MobileNetV2, which is residual?

That's why I called it "a wild guess". Maybe MobileNet and MobileNetV2 does not have such a wide output variance as DenseNet. The former only have one input at each layer (which is already normalized by BN), while DenseNet have connections all over the places so it can have larger variance as well as sensitivity to small changes, and BN might not help as much.


Now, try this checklist:

  • Manually collect activation statistics of both 80 and 200 models on TensorFlow, not only the outputs but inner layers as well. Is the values focused in one area or it spreads out widely?
  • See if single-input activations of the TensorFlow model spreads out nicely, or we may have some issues with it concentrating in one place?
  • Most importantly: see what are the outputs of the Quantized TF-Lite model? If there are problems with the variance as described above, here is where it will show itself the most.

PS: please share your results as well, I think many will be interested in troubleshooting quantization issues :)

Upvotes: 4

Related Questions