How does bias work in pytorch quantized convolution?

Question

I'm trying to perform static post-train quantization in PyTorch. For this example, I tried quantizing a Conv2d layer with a bias:

def quantize(model, input_shape):
    with torch.no_grad():
        # model = tq.QuantWrapper(model)
        observer = tq.PerChannelMinMaxObserver()
        model.qconfig = torch.quantization.QConfig(activation=tq.MinMaxObserver,
                                                   weight=observer.with_args(dtype=torch.qint8,
                                                                             qscheme=torch.per_channel_affine))
        #model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
        model = tq.QuantWrapper(model)
        tq.prepare(model, inplace=True)

        for i in range(1000):
            x = torch.ones(2, *input_shape)
            #x = torch.randn(2, *input_shape)
            tmp = model(x)
        tq.convert(model, inplace=True)
    return model

input_shape = (5, 7, 7)
model_b = nn.Conv2d(input_shape[0], 2, 3, bias=True)
for p in model_b.parameters():
    torch.nn.init.zeros_(p)
model_b.bias.data.fill_(.5)
model_b = quantize(model_b, input_shape)
model_b.eval()

The PyTorch documentation explicitly states that bias is not quantized and is kept as a float tensor. The integer representation of the output yields:

tensor([[[[255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255]],

         [[255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255],
          [255, 255, 255, 255, 255]]]], dtype=torch.uint8)

However, the float representation yields:

tensor([[[[0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000]],

         [[0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000, 0.5000, 0.5000]]]], size=(1, 2, 5, 5),
       dtype=torch.quint8, quantization_scheme=torch.per_tensor_affine,
       scale=0.0019607844296842813, zero_point=0)

I have searched for information on the issue and came to the conclusion that the scales and zero points used for requantizing convolution output take the bias into account, and also that during GEMM operations the bias is quantized to int32_t before being added to int32_t result of the GEMM. From the example above, if it were simply cast to int32_t, the integer and float outputs would be 0.

My question is: how is the bias quantized to int32_t, if it isn't converted to a quantized tensor?

How does bias work in pytorch quantized convolution?

Answers (1)

Related Questions