magenta
magenta

Reputation: 139

tensorflow lite: convert error for the quantized graphdef

I folllowed the tutorial (https://www.tensorflow.org/performance/quantization) to generate the quantized graphdef file:

curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" | tar -C tensorflow/examples/label_image/data -xz
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
  --in_graph=tensorflow/examples/label_image/data/inception_v3_2016_08_28_frozen.pb \
  --out_graph=/tmp/inception_v3_quantized_graph.pb \
  --inputs=input \
  --outputs=InceptionV3/Predictions/Reshape_1 \
  --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'

Then convert the quantized graphdef to tflite file:

    bazel-bin/tensorflow/contrib/lite/toco/toco \
  --input_file=/tmp/inception_v3_quantized_graph.pb\
  --output_file=/tmp/inception_v3_quantized_graph.lite \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --inference_type=QUANTIZED_UINT8 --input_type=QUANTIZED_UINT8\
  --input_shape=1,299,299,3 \
  --input_array=input \
  --output_array=InceptionV3/Predictions/Reshape_1 \
  --mean_value=128 \
  --std_value=127

It failed with the error:

2017-11-23 12:36:40.637143: F tensorflow/contrib/lite/toco/tooling_util.cc:549] Check failed: model.arrays.count(input_array.name()) Input array not found: input
Aborted (core dumped)

Run the summarize_graph tool

bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
  --in_graph=/tmp/inception_v3_quantized_graph.pb

The input node exists.

Found 1 possible inputs: (name=input, type=float(1), shape=[1,299,299,3])
No variables spotted.
Found 1 possible outputs: (name=InceptionV3/Predictions/Reshape_1, 
op=Dequantize)
Found 23824934 (23.82M) const parameters, 0 (0) variable parameters, and 268 
control_edges
Op types used: 673 Const, 214 Requantize, 214 RequantizationRange, 134 Reshape, 134 Max, 134 Min, 134 QuantizeV2, 95 QuantizedConv2D, 94 QuantizedRelu, 94 QuantizedAdd, 49 Dequantize, 24 QuantizedMul, 15 ConcatV2, 10 QuantizedAvgPool, 4 QuantizedMaxPool, 2 QuantizedReshape, 1 QuantizedBiasAdd, 1 Placeholder, 1 Softmax, 1 Squeeze

Did I miss anything? What is the right way to convert the quantized graphdef file into tflite file?

Thanks!

Upvotes: 2

Views: 3603

Answers (1)

Benoit Jacob
Benoit Jacob

Reputation: 31

I reproduced your command-lines with a clean checkout of TF (precise commit).

I do get a Toco error, but not the same as you:

F tensorflow/contrib/lite/toco/tooling_util.cc:1155] Array InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/batchnorm/mul_eightbit/input__port__0/min, which is an input to the (Unsupported TensorFlow op: QuantizeV2) operator producing the output array InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/batchnorm/mul_eightbit/input__port__0/quantize, is lacking min/max data, which is necessary for quantization. Either target a non-quantized output format, or change the input graph to contain min/max information, or pass --default_ranges_min= and --default_ranges_max= if you do not care about the accuracy of results.

This is not a toco bug, but rather, toco complaining about two issues in this file, inception_v3_quantized_graph.pb:

  1. An array ('tensor' in TensorFlow parlance) is lacking min-max range information. That is the immediate cause of this error message.
  2. An operator in this graph is QuantizeV2. Toco doesn't know about this type of operator. This isn't the immediate topic of this error message, but it is a real issue that you'd run into at a later point if you made it past this point.

TensorFlow Lite brings a new quantization approach which isn't the same as was previously done in the existing TensorFlow doc and tool which you mention. The errors you're getting here boil down to trying to use the TensorFlow Lite converter, which expects graphs quantized in the new approach, with a graph quantized with the old approach.

We are in the process of documenting the new quantization approach.

Meanwhile, you might be able to experiment with it already with these few hints. The new quantization approach requires inserting "fake quantization" nodes in your float training graph, as described here: https://www.tensorflow.org/versions/r0.12/api_docs/python/array_ops/fake_quantization

The purpose of these nodes is to accurately simulate the accuracy impact of 8-bit quantization during training, and to record exact min-max ranges used for that quantization. It is essential to place these nodes at the right places, as the point of quantized training is to allow reproducing exactly the same arithmetic in inference, and quantized inference needs to implement whole fused layers (Conv + BiasAdd + ReLU), (fully-connected + BiasAdd + ReLU) as a single fused operation. Accordingly, fake_quant nodes should be placed:

  • On the output activations of each (fused) layer (e.g. Conv+BiasAdd+ReLU) after the activation function (e.g. on the output of ReLU). Not before (not around the BiasAdd).
  • On the Conv/fully-connected weights just before they are consumed by the Conv/FullyConnected op.
  • Do not place fake_quantization nodes on the bias vectors.

That only scratches the surface of this complicated topic, which is why it is taking us some time to achieve good documentation on it! On the plus side, it should generally be possible to take a trial-and-error approach letting toco error messages guide you toward correct placement of fake_quantization nodes.

Once you have placed fake_quantization nodes, retrain as usual in TensorFlow, freeze_graph as usual, then run toco like in this example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/cmdline_examples.md#convert-a-tensorflow-graphdef-to-tensorflow-lite-for-quantized-inference

Also note that if you're only interested in evaluating performance and not concerned about actual accuracy, then you can use 'dummy quantization' --- running toco quantization directly on a plain float graph without having to deal with fake quantization and retraining. Just don't use that in your actual application! https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/cmdline_examples.md#use-dummy-quantization-to-try-out-quantized-inference-on-a-float-graph

Good luck!

Upvotes: 2

Related Questions