user123
user123

Reputation: 31

fp16 inference on cpu Pytorch

I have a pretrained pytorch model I want to inference on fp16 instead of fp32, I have already tried this while using the gpu but when I try it on cpu I get: "sum_cpu" not implemented for 'Half' torch. any fixes?

Upvotes: 3

Views: 13624

Answers (3)

Ramya R
Ramya R

Reputation: 183

Check out this documentation - https://intel.github.io/intel-extension-for-pytorch/latest/tutorials/features/amp.html. Intel Extension for PyTorch support Auto Mixed Precision feature for CPUs. In code, we need to change to torch.cpu.amp.autocast() instead of torch.autocast(device_name="cpu"). torch.cpu.amp supports BFloat16 data type.

Upvotes: 1

dragon7
dragon7

Reputation: 1133

If you have Intel's CPU you could try OpenVINO. It allows you to convert your model into Intermediate Representation (IR) and then run on the CPU with the FP16 support. I cannot guarantee your model is convertible (it depends on whether you have fancy custom layers) but it's worth giving it a try. You can find a full tutorial on how to convert the PyTorch model here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[pytorch,onnx]

Save your model to ONNX

OpenVINO cannot convert PyTorch model directly for now but it can do it with ONNX model. This sample code assumes the model is for computer vision.

dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=11)

Use Model Optimizer to convert ONNX model

The Model Optimizer is a command line tool which comes from OpenVINO Development Package so be sure you have installed it. It converts the ONNX model to IR, which is a default format for OpenVINO. It also changes the precision to FP16. Run in command line:

mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir "model_ir"

Run the inference on the CPU

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Upvotes: 1

Jerry Xie
Jerry Xie

Reputation: 21

As I know, a lot of CPU-based operations in Pytorch are not implemented to support FP16; instead, it's NVIDIA GPUs that have hardware support for FP16(e.g. tensor cores in Turing arch GPU) and PyTorch followed up since CUDA 7.0(ish). To accelerate inference on CPU by quantization to FP16, you may wanna try torch.bfloat16 dtype(https://github.com/pytorch/pytorch/issues/23509).

Upvotes: 2

Related Questions