Reputation: 41
Dear developers and NN enthusiasts, I have quantized a model (8-bit post-training quantization) and I'm trying to do inference with the resulting model using tflite interprter.
In some cases the interpreter runs properly, and I can do inference on the quantized model as expected, with outputs close enough to the original model. Thus, my setup appears to be correct. However, depending on the concrete quantized model, I frequently stumble across the following RuntimeError.
Traceback (most recent call last):
File ".\quantize_model.py", line 328, in <module>
interpreter.allocate_tensors()
File "---path removed---tf-nightly_py37\lib\site-packages\tensorflow\lite\python\interpreter.py", line 243, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/kernel_util.cc:154 scale_diff / output_scale <= 0.02 was not true.Node number 26 (FULLY_CONNECTED) failed to prepare.
Since the error appears to be related to the scale of the bias, I have retrained the original model using a bias_regularizer. However, the error persists.
Do you have any suggestion on how to avoid this error? should I train or design the model in a different way? Is it possible to suppress this error and continue as usual (even if the accuracy is reduced)?
I have used Netron to extracted some details regarding 'node 26' from the quantized tflite model:
*Node properties ->
type: FullyConnected, location:26. *Attributes asymmetric_quantization: false, fused_activation: NONE, keep_num_dims: false, weights_format: DEFAULT.
*Inputs ->
input. name: functional_3/tf_op_layer_Reshape/Reshape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape/Reshape
type: int8[1,34]
quantization: 0 ≤ 0.007448929361999035 * (q - -128) ≤ 1.8994770050048828
location: 98
weights. name: functional_3/tf_op_layer_MatMul_54/MatMul_54;StatefulPartitionedCall/functional_3/tf_op_layer_MatMul_54/MatMul_54
type: int8[34,34]
quantization: -0.3735211491584778 ≤ 0.002941111335530877 * q ≤ 0.1489555984735489
location: 42
[weights omitted to save space]
bias. name: functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int32[34]
quantization: 0.0002854724007192999 * q
location: 21
[13,-24,-19,-9,4,59,-18,9,14,-15,13,6,12,5,10,-2,-14,16,11,-1,12,7,-4,16,-8,6,-17,-7,9,-15,7,-29,5,3]
*outputs ->
output. name: functional_3/tf_op_layer_AddV2/AddV2;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2/AddV2;functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_99/Reshape_99/shape;functional_3/tf_op_layer_Reshape_1/Reshape_1;StatefulPartitionedCall/functional_3/tf_op_layer_Reshape_1/Reshape_1;functional_3/tf_op_layer_AddV2_93/AddV2_3/y;StatefulPartitionedCall/functional_3/tf_op_layer_AddV2_93/AddV2_3/y
type: int8[1,34]
quantization: -0.46506571769714355 ≤ 0.0031077787280082703 * (q - 22) ≤ 0.32741788029670715
location: 99
Upvotes: 3
Views: 2123
Reputation: 1600
I have another approach that overcomes my problem and share with you guys. According to quantization file
The quantization for activation only support with Relu and Identity. It may fail if we miss the biasAdd before Relu activation, therefore, we can wrap the layer as an identity to bypass this by tf.identity
. I have tried and it works for my case without editing anything in the cpp files.
Upvotes: 0
Reputation: 13
I have found a workaround, which involves manually modifing the quantized tflite model. This is the file that triggers the RuntimeError in question (tensorflow/lite/kernels/kernel_utils.cc):
// TODO(ahentz): The following conditions must be guaranteed by the training pipeline.
...
const double scale_diff = std::abs(input_product_scale - bias_scale);
const double output_scale = static_cast<double>(output->params.scale);
TF_LITE_ENSURE(context, scale_diff / output_scale <= 0.02);
The comment makes clear that some functionality in model quantization still needs to be completed. The failing condition is related to the scale of the bias. I verified that my quantized model does not fulfill the constraint above. In order to manually fix the quantized model, these steps can be done:
Of course, this solution is only a temporary workaround useful until the code in tensorflow's quantizer is corrected.
Upvotes: 1