Reputation: 72
I am using tensorRT API to optimize the U-NET model which is built using keras. The result after optimization are not good enough , so i am thinking of making the same model in tensorflow as Keras is high-end API and maybe it's inference is slow. So my question is did building the same model in tensorflow will improve the inference as compare to keras model. And did tensorrt optimize tensorflow model better than keras.
I did some research but didn't find anything regarding inference speed of the same model in tensorflow and keras.
Upvotes: 0
Views: 1389
Reputation: 1133
I don't think rebuilding the entire network in pure Tensorflow is worth of it. I wouldn't expect to see much performance gains.
If using TensorRT doesn't give you good results, I suggest trying OpenVINO. OpenVINO is optimized for Intel hardware, but it should work with any CPU. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. You care about latency, so I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
There is another notebook that compares performance for the PyTorch semantic segmentation model before and after conversion to OpenVINO. I would expect something similar in Tensorflow.
Disclaimer: I work on OpenVINO.
Upvotes: 0
Reputation: 2682
Keras (when using the tf backend) is a library that builds tensorflow computational graphs. The computations are performed on those graphs and not by Keras directly. Unless you believe you can optimise the generated graph manually, you can expect no performance differences. You can use tensorboard Keras callback to visualise the tensorflow model in tensorboard. And then determine if you believe you can manually optimise it. I would discourage anyone from going down that approach except for ML researchers and ML library developers.
If the issue is in terms of model accuracy / error metrics rather than CPU/GPU cycles when doing inference; I don't believe that converting to tensorflow would necessarily improve the model.
If you want help with the model itself, perhaps you can try to reword the question with the description of the model (would really help if it runs on a public dataset).
Upvotes: 0
Reputation: 14983
As far as I tested, there was no significant difference (maybe a tiny tiny overhead for Keras).
The better inference time that you expect will not be obtained by switching from keras to tensorflow. I have worked with TensorRT and most of the problems come from the fact that not all layers are supported(for the conversion/optimization).
Ensure that everything that the entire pipeline Keras Model -- TensorFlow Model -- Layer Optimization -- TensorRT is done with the same version of tensorflow. I would recommend to train the model via tensorflow.keras
instead of simple keras
.
Also, make sure that you convert with the right FP operations. (FP32/FP16/INT8). The biggest gain in inference speed would be if you converted from standard (FP32) to INT8. In my experience, the conversion from FP32 to FP16 will not speed up significantly.
Semantic Segmentation is the most computationally expensive task, so don't expect to have a very fast inference model deployed on TX2 for example (with TensorRT).
Upvotes: 1