Owen Zhang
Owen Zhang

Reputation: 23

Onnxruntime quantization script for MatMulNbits, what is the type after conversion?

In the onnxruntime documentation, for quantization here:

https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantize-to-int4uint4

It sets accuracy_level=4 which means it's a 4 bit quantization corresponding to int4/uint4.

However in the MatMulNbits documentation, accuracy level of 4 means int8:

https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#attributes-35

And if using that script to apply quantization the MatMulNbits node's accuracy level is 4 and bits is 4, however the type for the tensor is int8.

So is this quantization converting weights to int4?

Upvotes: 0

Views: 11

Answers (0)

Related Questions