How to manually implement padding for pytorch convolutions

Question

I'm trying to port some pytorch code to tensorflow 2.0 and am having difficulty figuring out how to translate the convolution functions between the two. The way both libraries deal with padding is the sticking point. Basically, I'd like to understand how I can manually produce the padding that pytorch does under the hood so that I can translate that to tensorflow.

The code below works if I don't do any padding, but I can't figure out how to make the two implementations match once any padding is added.

output_padding = SOME NUMBER
padding = SOME OTHER NUMBER
strides = 128

tensor = np.random.rand(2, 258, 249)
filters = np.random.rand(258, 1, 256)

out_torch = F.conv_transpose1d(
    torch.from_numpy(tensor).float(),
    torch.from_numpy(filters).float(),
    stride=strides,
    padding=padding,
    output_padding=output_padding)

def pytorch_transpose_conv1d(inputs, filters, strides, padding, output_padding):
    N, L_in = inputs.shape[0], inputs.shape[2]
    out_channels, kernel_size = filters.shape[1], filters.shape[2]
    time_out = (L_in - 1) * strides - 2 * padding + (kernel_size - 1) + output_padding + 1
    padW = (kernel_size - 1) - padding
    
    # HOW DO I PAD HERE TO GET THE SAME OUTPUT AS IN PYTORCH
    inputs = tf.pad(inputs, [(?, ?), (?, ?), (?, ?)])

    return tf.nn.conv1d_transpose(
        inputs,
        tf.transpose(filters, perm=(2, 1, 0)),
        output_shape=(N, out_channels, time_out),
        strides=strides,
        padding="VALID",
        data_format="NCW")

out_tf = pytorch_transpose_conv1d(tensor, filters, strides, padding, output_padding)
assert np.allclose(out_tf.numpy(), out_torch.numpy())

Girish Hegde · Accepted Answer

Padding

To translate the convolution and transpose convolution functions (with padding padding) between the Pytorch and Tensorflow we need to understand first F.pad() and tf.pad() functions.

torch.nn.functional.pad(input, padding_size, mode='constant', value=0):

padding size: The padding size by which to pad some dimensions of input are described starting from the last dimension and moving forward.
to pad only the last dimension of the input tensor, then pad has the form (padding_left, padding_right)
to pad the last 3 dimensions, (padding_left, padding_right, padding_top, padding_bottom, padding_front,padding_back)

tensorflow.pad(input, padding_size, mode='CONSTANT',name=None,constant_values=0)

padding_size: is an integer tensor with shape [n, 2], where n is the rank of the tensor. For each dimension D of input, paddings[D, 0] indicates how many values to add before the contents of tensor in that dimension, and paddings[D, 1] indicates how many values to add after the contents of tensor in that dimension.

Here's table representing F.pad and tf.pad equivalents along with output tensor for the input tensor
[[[1, 1], [1, 1]]] which is of shape (1, 2, 2)

Padding in Convolution

Let's now move to PyTorch padding in Convolution layers

F.conv1d(input, ..., padding, ...):
- padding controls the amount of implicit paddings on both sides for padding number of points.
- padding=(size) applies F.pad(input, [size, size]) i.e padding last dimension with (size, size) equivalent to tf.pad(input, [[0, 0], [0, 0], [size, size]])
F.conv2d(input, ..., padding, ...):
- padding=(size) applies F.pad(input, [size, size, size, size]) i.e padding last 2 dimensions with (size, size) equivalent to tf.pad(input, [[0, 0], [size, size], [size, size]])
- padding=(size1, size2) applies F.pad(input, [size2, size2, size1, size1]) which is equivalent to tf.pad(input, [[0, 0], [size1, size1], [size2, size2]])

Padding in Transpose Convolution

PyTorch padding in Transpose Convolution layers

F.conv_transpose1d(input, ..., padding, output_padding, ...):
- dilation * (kernel_size - 1) - padding padding will be added to both sides of each dimension in the input.
- Padding in transposed convolutions can be seen as allocating fake outputs that will be removed
- output_padding controls the additional size added to one side of the output shape
- Check this to understand what exactly happens during transpose convolution in pytorch.
- Here's the formula to calculate output size of transpose convolution:

output_size = (input_size - 1)stride + (kerenel_size - 1) + 1 + output_padding - 2padding

Codes

Transpose Convolution

import torch
import torch.nn as nn
import torch.nn.functional as F
import tensorflow as tf
import numpy as np

# to stop tf checkfailed error not relevent to actual code
import os
os.environ["CUDA_DEVICE_ORDER"]    = "PCI_BUS_ID"   
os.environ["CUDA_VISIBLE_DEVICES"] = "1"




def tconv(tensor, filters, output_padding=0, padding=0, strides=1):
    '''
    tensor         : input tensor of shape (batch_size, channels, W) i.e (NCW)
    filters        : input kernel of shape (in_ch, out_ch, kernel_size)
    output_padding : single number must be smaller than either stride or dilation
    padding        : single number should be less or equal to ((valid output size + output padding) // 2)
    strides        : single number
    '''
    bs, in_ch, W = tensor.shape
    in_ch, out_ch, k_sz = filters.shape
    
    out_torch = F.conv_transpose1d(torch.from_numpy(tensor).float(), 
                                   torch.from_numpy(filters).float(),
                                   stride=strides, padding=padding, 
                                   output_padding=output_padding)
    out_torch = out_torch.numpy()
 
    # output_size = (input_size - 1)*stride + (kerenel_size - 1) + 1 + output_padding - 2*padding
    # valid out size -> padding=0, output_padding=0 
    # -> valid_out_size =  (input_size - 1)*stride + (kerenel_size - 1) + 1
    out_size  = (W - 1)*strides + (k_sz - 1) + 1 

    # input shape -> (batch_size, W, in_ch) and filters shape -> (kernel_size, out_ch, in_ch) for tf conv
    valid_tf  = tf.nn.conv1d_transpose(np.transpose(tensor, axes=(0, 2, 1)), 
                                       np.transpose(filters, axes=(2, 1, 0)), 
                                       output_shape=(bs, out_size, out_ch), 
                                       strides=strides, padding='VALID', 
                                       data_format='NWC')
    # output padding
    tf_outpad = tf.pad(valid_tf, [[0, 0], [0, output_padding], [0, 0]])
    # NWC to NCW
    tf_outpad = np.transpose(tf_outpad, (0, 2, 1))

    # padding -> input, begin, shape -> remove `padding` elements on both side
    out_tf    = tf.slice(tf_outpad, [0, 0, padding], [bs, out_ch, tf_outpad.shape[2]-2*padding])

    out_tf    = np.array(out_tf)

    print('output size(tf, torch):', out_tf.shape, out_torch.shape)
    # print('out_torch:
', out_torch)
    # print('out_tf:
', out_tf)
    print('outputs are close:', np.allclose(out_tf, out_torch))



tensor  = np.random.rand(2, 1, 7)
filters = np.random.rand(1, 2, 3)
tconv(tensor, filters, output_padding=2, padding=5, strides=3)

Results

>>> tensor  = np.random.rand(2, 258, 249)
>>> filters = np.random.rand(258, 1, 7)
>>> tconv(tensor, filters, output_padding=4, padding=9, strides=6)
output size(tf, torch): (2, 1, 1481) (2, 1, 1481)
outputs are close: True

Some useful links:

pytorch 'SAME' convolution
How pytorch transpose conv works