Reputation: 166
I am using eager mode quantization. However, I want to skip some layers from being quantized. I am following the tutorial here
However, when I test the model now I get the following error:
Could not run ‘aten::_slow_conv2d_forward’ with arguments from the ‘QuantizedCPU’ backend.
If I understand correctly, this is because the layers with qconfig = none are receiving quantized data while expecting dequantized data. Is there a way I can add instruction to dequantize data before the layer and quantize it after the layer, in my loop? or what other possible workaround might I do for this purpose?
The code to exclude layers:
for quantized_layer, _ in fused_model.named_modules():
if (quantized_layer in sortedSensitivityDict):
if sortedSensitivityDict[quantized_layer] > 0.94:
_.qconfig = torch.quantization.get_default_qconfig("qnnpack")
else:
_.qconfig = None
The code to quantize:
import torch.optim as optim
model_fp32_prepared = torch.quantization.prepare(fused_model)
def calibrate(model, data_loader):
model.eval()
with torch.no_grad():
for image, target in data_loader:
model(image)
calibrate(model_fp32_prepared, val_loader)
model_fp32_prepared.eval()
model_int8 = torch.quantization.convert(model_fp32_prepared)
The main problem is that I am using MobileNetV3 where the forward function is as follows:
def _forward_impl(self, x: Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
Since the layers are in self.features
, I am not sure how to use self.quant
and self.dequant
Upvotes: 0
Views: 875
Reputation: 51
Blog author here - that can be fairly tricky with eager mode unfortunately. We have a new API using FX Graph Mode that makes operations like these easier. You won't need to set each module's qconfig, instead you can pass a dict with the layer names that you want to disable.
Something like:
disable_layers = []
for quantized_layer, _ in fused_model.named_modules():
if (quantized_layer in sortedSensitivityDict):
if sortedSensitivityDict[quantized_layer] > 0.94:
disable_layers.append(quantized_layer)
qconfig_dict = {
# Global Config
"": torch.quantization.get_default_qconfig("qnnpack"),
# Disable by layer-name
"module_name": [(m, None) for m in disable_layers],
# Or disable by layer-type
"object_type": [
(torch.nn.functional.add, None), # skips quantization for all functional.add layers
...,
],
}
model_fp32_prepared = torch.quantization.quantize_fx.prepare_fx(model, qconfig_dict)
# calibrate as usual
model_int8 = torch.quantization.quantize_fx.convert_fx(model_fp32_prepared)
FYR, I have a notebook walking through this workflow here: https://github.com/fbsamples/pytorch-quantization-workshop/blob/main/Quant_Workflow.ipynb
Upvotes: 0