TestCandidate
TestCandidate

Reputation: 166

pytorch eager quantization - skipping modules

I am using eager mode quantization. However, I want to skip some layers from being quantized. I am following the tutorial here

However, when I test the model now I get the following error:

Could not run ‘aten::_slow_conv2d_forward’ with arguments from the ‘QuantizedCPU’ backend.

If I understand correctly, this is because the layers with qconfig = none are receiving quantized data while expecting dequantized data. Is there a way I can add instruction to dequantize data before the layer and quantize it after the layer, in my loop? or what other possible workaround might I do for this purpose?

The code to exclude layers:

for quantized_layer, _ in fused_model.named_modules():
   if (quantized_layer in sortedSensitivityDict):
      if sortedSensitivityDict[quantized_layer] > 0.94:
        _.qconfig = torch.quantization.get_default_qconfig("qnnpack")
      else:
        _.qconfig = None

The code to quantize:

import torch.optim as optim
model_fp32_prepared = torch.quantization.prepare(fused_model)

def calibrate(model, data_loader):
    model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)

calibrate(model_fp32_prepared, val_loader)
model_fp32_prepared.eval()
model_int8 = torch.quantization.convert(model_fp32_prepared)

The main problem is that I am using MobileNetV3 where the forward function is as follows:

def _forward_impl(self, x: Tensor) -> Tensor:
      x = self.features(x)
      x = self.avgpool(x)
      x = torch.flatten(x, 1)
      x = self.classifier(x)

Since the layers are in self.features, I am not sure how to use self.quant and self.dequant

Upvotes: 0

Views: 875

Answers (1)

Suraj813
Suraj813

Reputation: 51

Blog author here - that can be fairly tricky with eager mode unfortunately. We have a new API using FX Graph Mode that makes operations like these easier. You won't need to set each module's qconfig, instead you can pass a dict with the layer names that you want to disable.

Something like:

disable_layers = []
for quantized_layer, _ in fused_model.named_modules():
   if (quantized_layer in sortedSensitivityDict):
      if sortedSensitivityDict[quantized_layer] > 0.94:
          disable_layers.append(quantized_layer)

qconfig_dict = {
    # Global Config
    "": torch.quantization.get_default_qconfig("qnnpack"),

    # Disable by layer-name
    "module_name": [(m, None) for m in disable_layers],

    # Or disable by layer-type
    "object_type": [
        (torch.nn.functional.add, None),  # skips quantization for all functional.add layers
        ...,
        ],
}

model_fp32_prepared = torch.quantization.quantize_fx.prepare_fx(model, qconfig_dict)

# calibrate as usual

model_int8 = torch.quantization.quantize_fx.convert_fx(model_fp32_prepared)

FYR, I have a notebook walking through this workflow here: https://github.com/fbsamples/pytorch-quantization-workshop/blob/main/Quant_Workflow.ipynb

Upvotes: 0

Related Questions