Reputation: 23
In the onnxruntime documentation, for quantization here:
https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantize-to-int4uint4
It sets accuracy_level=4 which means it's a 4 bit quantization corresponding to int4/uint4.
However in the MatMulNbits documentation, accuracy level of 4 means int8:
https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#attributes-35
And if using that script to apply quantization the MatMulNbits node's accuracy level is 4 and bits is 4, however the type for the tensor is int8.
So is this quantization converting weights to int4?
Upvotes: 0
Views: 11