Description of TF Lite's Toco converter args for quantization aware training

Question

These days I am trying to track down an error concerning the deployment of a TF model with TPU support.

I can get a model without TPU support running, but as soon as I enable quantization, I get lost.

I am in the following situation:

Created a model and trained it
Created an eval graph of the model
Froze the model and saved the result as protocol buffer
Successfully converted and deployed it without TPU support

For the last point, I used the TFLiteConverter's Python API. The script that produces a functional tflite model is

import tensorflow as tf

graph_def_file = 'frozen_model.pb'
inputs = ['dense_input']
outputs = ['dense/BiasAdd']

converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs)
converter.inference_type = tf.lite.constants.FLOAT
input_arrays = converter.get_input_arrays()

converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]

tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

This tells me that my approach seems to be ok up to this point. Now, if I want to utilize the Coral TPU stick, I have to quantize my model (I took that into account during training). All I have to do is to modify my converter script. I figured that I have to change it to

import tensorflow as tf

graph_def_file = 'frozen_model.pb'
inputs = ['dense_input']
outputs = ['dense/BiasAdd']

converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs)
converter.inference_type = tf.lite.constants.QUANTIZED_UINT8      ## Indicates TPU compatibility
input_arrays = converter.get_input_arrays()

converter.quantized_input_stats = {input_arrays[0]: (0., 1.)}     ## mean, std_dev
converter.default_ranges_stats = (-128, 127)                      ## min, max values for quantization (?)
converter.allow_custom_ops = True                                 ## not sure if this is needed

## REMOVED THE OPTIMIZATIONS ALTOGETHER TO MAKE IT WORK

tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

This tflite model produces results when loaded with the Python API of the interpreter, but I am not able to understand their meaning. Also, there is no (or if there is, it is hidden well) documentation on how to choose mean, std_dev and the min/max ranges. Also, after compiling this with the edgetpu_compiler and deploying it (loading it with the C++ API), I receive an error:

INFO: Initialized TensorFlow Lite runtime.
ERROR: Failed to prepare for TPU. generic::failed_precondition: Custom op already assigned to a different TPU.
ERROR: Node number 0 (edgetpu-custom-op) failed to prepare.

Segmentation fault

I suppose I missed a flag or something during the conversion process. But as the documentation is also lacking here, I can't say for sure.

In short:

What do the params mean, std_dev, min/max do and how do they interact?
What am I doing wrong during the conversion?

I am grateful for any help or guidance!

EDIT: I have opened a github issue with the full test code. Feel free to play around with this.

Description of TF Lite's Toco converter args for quantization aware training

Answers (1)

Related Questions

Description of TF Lite&#39;s Toco converter args for quantization aware training

Answers (1)

Related Questions

Description of TF Lite's Toco converter args for quantization aware training