Reputation: 328
In order to improve the latency of a trained model, I tried to use Tensorflow mixed-precision.
Just setting the policy as mentioned in https://www.tensorflow.org/guide/mixed_precision does not seem to increase the model speed:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)
But when trying a toy example with other CNN network, I found that speed increase by a factor of x2 if I am training the model using mixed-precision.
I am trying to avoid retraining the model with mixed-precision since the model I am using is quite complex and convert it to be mixed-precision suitable is not an easy task.
Is there a way convert an already trained to really work in mixed-precision mode (and of course gain the mixed-precision speedup)?
Upvotes: 1
Views: 1091
Reputation: 1133
To improve the latency of a trained model you could try OpenVINO. It's a heavily optimized toolkit for inference. However, it works with Intel's hardware like CPUs and iGPUs (integrated GPU into your CPU like Intel HD Graphics) instead of Nvidia GPU but I think it is worth giving it a try. Here are some performance benchmarks.
It's rather straightforward to convert the Keras model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:
mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.
Upvotes: 2