Wildly different quantization performance on tensorflow-lite conversion of keras-trained DenseNet models

Question

I have two models that I have trained using Keras. The two models use the same architecture (the DenseNet169 implementation from keras_applications.densenet package), however they each have a different number of target classes (80 in one case, 200 in the other case).

Converting both models to .pb format works just fine (identical performance in inference). I use the keras_to_tensorflow utility found at https://github.com/amir-abdi/keras_to_tensorflow
Converting both models to .tflite format using TOCO works just fine (again, identical performance in inference).
Converting the 80-class model to .tflite using quantization in TOCO works reasonably well (<1% drop in top 3 accuracy).
Converting the 200-class model to .tflite using quantization in TOCO goes off the rails (~30% drop in top 3 accuracy).

I'm using an identical command-line to TOCO for both of the models:

toco --graph_def_file frozen_graph.pb \ --output_file quantized_graph.tflite \ --inference_type FLOAT \ --inference_input_type FLOAT \ --output_format TFLITE \ --input_arrays input_1 \ --output_arrays output_node0 \ --quantize True

My tensorflow version is 1.11.0 (installed via pip on macOS Mojave, although I have also tried the same command/environment on the Ubuntu machine I use for training with identical results).

I'm at a complete loss as to why the accuracy of inference is so drastically affected for one model and not the other. This holds true for many different trainings of the same two architecture/target class combinations. I feel like I must be missing something, but I'm baffled.

Wildly different quantization performance on tensorflow-lite conversion of keras-trained DenseNet models

Answers (1)

Related Questions