James
James

Reputation: 4052

Converting tf.keras model to TFLite: Model is slow and doesn't work with XNN Pack

Until recently I had been training a model using TF 1.15 based on MobileNetV2.

After training I had always been able to run these commands to generate a TFLite version:

    tf.keras.backend.set_learning_phase(0)

    converter = tf.lite.TFLiteConverter.from_keras_model_file(
        tf_keras_path)

    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    converter.target_spec.supported_types = [
        tf.lite.constants.FLOAT16]
    tflite_model = converter.convert()

The resulting model was fast enough for our needs and when our Android developer used XNN Pack, we got an extra 30% reduction in inference time.

More recently I've developed a replacement model using TF2.4.1, based on the built-in keras implementation of efficientnet-b2.

This new model has larger input image size ((260,260) vs (224,224)) and its keras inference time is about 1.5x that of the older model.

However, when I convert to TFLite using these commands:

converter = tf.lite.TFLiteConverter.from_keras_model(newest_v3)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

there are a number of problems:

I have also tried saving as SavedModel and converting the saved model.

Another attempt I made was to use the command line tool with these arguments (and as far as I recall, pretty much every permutation of arguments possible):

tflite_convert --saved_model_dir newest_v3/ \
    --enable_v1_converter \
    --experimental_new_converter True \
    --input-shape=1,260,260,3 \
    --input-array=input_1:0 \
    --post_training_quantize \
    --quantize_to_float16 \
    --output_file newest_v3d.tflite\
    --allow-custom-ops

If anyone can shed some light onto what's going on here I'd be very grateful.

Upvotes: 1

Views: 1371

Answers (1)

Mr K.
Mr K.

Reputation: 1201

Tensorflow-lite does currently support tensors with dynamic shapes(enabled by default and explicitly by the "experimental_new_converter True" option on your conversion) but this issue below points out that XNNPack does not:

https://github.com/tensorflow/tensorflow/issues/42491

As XNNPack is not able to optimize the graph of the EfficientNet model you are not getting a boost in performance making the inference 5 times slower than before instead of just around 1.5 times.

Personally I would just recommend to move to EfficientNet-lite as it's the mobile/TPU counterpart of EfficientNet and was designed taking into account the restricted sets of operations available in Tensorflow-lite:

https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html

Upvotes: 2

Related Questions