Reputation: 999
I have two models that I have trained using Keras. The two models use the same architecture (the DenseNet169 implementation from keras_applications.densenet
package), however they each have a different number of target classes (80 in one case, 200 in the other case).
Converting both models to .pb format works just fine (identical performance in inference). I use the keras_to_tensorflow
utility found at https://github.com/amir-abdi/keras_to_tensorflow
Converting both models to .tflite format using TOCO works just fine (again, identical performance in inference).
Converting the 80-class model to .tflite using quantization in TOCO works reasonably well (<1% drop in top 3 accuracy).
Converting the 200-class model to .tflite using quantization in TOCO goes off the rails (~30% drop in top 3 accuracy).
I'm using an identical command-line to TOCO for both of the models:
toco --graph_def_file frozen_graph.pb \
--output_file quantized_graph.tflite \
--inference_type FLOAT \
--inference_input_type FLOAT \
--output_format TFLITE \
--input_arrays input_1 \
--output_arrays output_node0 \
--quantize True
My tensorflow version is 1.11.0 (installed via pip on macOS Mojave, although I have also tried the same command/environment on the Ubuntu machine I use for training with identical results).
I'm at a complete loss as to why the accuracy of inference is so drastically affected for one model and not the other. This holds true for many different trainings of the same two architecture/target class combinations. I feel like I must be missing something, but I'm baffled.
Upvotes: 3
Views: 695
Reputation: 10400
This was intended to be just a small sneaky comment since i'm not sure if this can help, but then it got so long that I decided to make it an answer...
My wild guess is that the accuracy drop may be caused by the variance of the output of your network. After quantization (btw, tensorflow uses fixed-point quantization), you are playing with only 256
points (8 bit) instead of the full dense range of float32
.
On most of the blogs around the web, it is stated that the main assumption of Quantization is that weights and activations tends to lie in a small range of values. However, there is an implicit assumption that is less talked about in blogs and literature: the activations of the network on a single sample should be decently spread across the quantized range.
Consider the following scenario where the assumption holds place (a histogram of activations on single sample at specific layer, and the vertical lines are quantization points):
Now consider the scenario where the second assumption is not true, but the first assumption can still hold place (the blue is overall value distribution, gray is for given sample, vertical strips are quantization points):
In the first scenario, the distribution for the given sample is covered well (with a lot of quant points). In the second, only 2 quant points. The similar thing can happen to your network as well: maybe for 80 classes it still have enough quantization points to distinguish, but with 200 classes we might not have enough...
Hey, but it doesn't affect MobileNet with 1000 classes, and even MobileNetV2, which is residual?
That's why I called it "a wild guess". Maybe MobileNet and MobileNetV2 does not have such a wide output variance as DenseNet. The former only have one input at each layer (which is already normalized by BN), while DenseNet have connections all over the places so it can have larger variance as well as sensitivity to small changes, and BN might not help as much.
Now, try this checklist:
PS: please share your results as well, I think many will be interested in troubleshooting quantization issues :)
Upvotes: 4