Reputation: 287

is there any way to optimize pytorch inference in cpu?

I am going to serve pytorch model(resnet18) in website.
However, inference in cpu(amd3600) requires 70% of cpu resources.
I don't think the server(heroku) can handle this computation.
Is there any way to optimize inference in cpu?
many thanks

Upvotes: 5

Answers (3)

Ramya R

Reputation: 183

If you are using Intel CPU, please check out Intel Extension for PyTorch. This Intel extension provides quantization features to deliver good accuracy results for large deep learning models.

Check out the article - https://www.intel.com/content/www/us/en/developer/articles/code-sample/accelerate-pytorch-models-using-quantization.html. This article also shows you a code sample on how to accelerate PyTorch-based models by applying Intel Extension for PyTorch quantization.

Additionally, there is a tool called Intel Neural Compressor. It is a model-compression tool that helps to speed up the inference without sacrificing accuracy. Check out this article - https://www.intel.com/content/www/us/en/developer/articles/technical/pytorch-quantization-using-intel-neural-compressor.html to perform INT8 quantization on a PyTorch model and optimize the performance using Intel Neural Compressor.

Upvotes: 1

dragon7

Reputation: 1133

Admittedly, I'm not an expert on Heroku but probably you can use OpenVINO. OpenVINO is optimized for Intel hardware but it should work with any CPU. It optimizes the inference performance by e.g. graph pruning or fusing some operations together. Here are the performance benchmarks for Resnet-18 converted from PyTorch.

You can find a full tutorial on how to convert the PyTorch model here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[pytorch,onnx]

Save your model to ONNX

OpenVINO cannot convert PyTorch model directly for now but it can do it with ONNX model. This sample code assumes the model is for computer vision.

dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=11)

Use Model Optimizer to convert ONNX model

The Model Optimizer is a command line tool which comes from OpenVINO Development Package so be sure you have installed it. It converts the ONNX model to OV format (aka IR), which is a default format for OpenVINO. It also changes the precision to FP16 (to further increase performance). Run in command line:

mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir "model_ir"

Run the inference on the CPU

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Upvotes: 2

Anvar Ganiev

Reputation: 189

You can try prune and quantize your model (techniques to compress model size for deployment, allowing inference speed up and energy saving without significant accuracy losses). There are examples in pytorch website with model pruning and quantization, you can check.

https://pytorch.org/tutorials/intermediate/pruning_tutorial.html https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html

Upvotes: 0

is there any way to optimize pytorch inference in cpu?

Answers (3)

Related Questions