How to extract overlapping patches from a 3D volume and recreate the input shape from the patches?

Question

Pytorch offers torch.Tensor.unfold operation which can be chained to arbitrarily many dimensions to extract overlapping patches. How can we reverse the patch extraction operation such that the patches are combined to the input shape.

The focus is 3D volumetric images with 1 channel (biomedical). Extracting is possible with unfold, how can we combine the patches if they overlap.

blanNL · Accepted Answer

To extract (overlapping-) patches and to reconstruct the input shape we can use the torch.nn.functional.unfold and the inverse operation torch.nn.functional.fold. These methods only process 4D tensors or 2D images, however you can use these methods to process one dimension at a time.

Few notes:

This way requires fold/unfold methods from pytorch, unfortunately I have yet to find a similar method in the TF api.
We start with 2D then 3D then 4D to show the incremental differences, you can extend to arbitrarily many dimensions (probably write a loop instead of hardcoding each dimension like i did)
We can extract patches in 2 ways, their output is the same. The methods are called extract_patches_Xd and extract_patches_Xds where X is the number of dimensions. The latter uses torch.Tensor.unfold() and has less lines of code. (output is the same, except it cannot use dilation)
The methods extract_patches_Xd and combine_patches_Xd are inverse methods and the combiner reverses the steps from the extracter step by step.
The lines of code are followed by a comment stating the dimensionality such as (B, C, T, D, H, W). The following are used:
1. B: Batch size
2. C: Channels
3. T: Time Dimension
4. D: Depth Dimension
5. H: Height Dimension
6. W: Width Dimension
7. x_dim_in: In the extraction method, this is the number input pixels in dimension x. In the combining method, this is the number of number of sliding windows in dimension x.
8. x_dim_out: In the extraction method, this is the number of sliding windows in dimension x. In the combining method, this is the number output pixels in dimension x.
I have a public notebook to try out the code
I have tried out basic 2D, 3D and 4D tensors as shown below. However, my code is not infallible and I appreciate feedback when tested on other inputs.
The get_dim_blocks() method is the function given on the pytorch docs website to compute the output shape of a convolutional layer.
Note that if you have overlapping patches and you combine them, the overlapping elements will be summed. If you would like to get the initial input again there is a way.
1. Create similar sized tensor of ones as the patches with torch.ones_like(patches_tensor).
2. Combine the patches into full image with same output shape. (this creates a counter for overlapping elements).
3. Divide the Combined image with the Combined ones, this should reverse any double summation of elements.

First (2D):

The torch.nn.functional.fold and torch.nn.functional.unfold methods can be used directly.

import torch

def extract_patches_2ds(x, kernel_size, padding=0, stride=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride)

    channels = x.shape[1]

    x = torch.nn.functional.pad(x, padding)
    # (B, C, H, W)
    x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1])
    # (B, C, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1])
    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1])
    # (B * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1])
    return x

def extract_patches_2d(x, kernel_size, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out
        
    channels = x.shape[1]
    h_dim_in = x.shape[2]
    w_dim_in = x.shape[3]
    h_dim_out = get_dim_blocks(h_dim_in, kernel_size[0], padding[0], stride[0], dilation[0])
    w_dim_out = get_dim_blocks(w_dim_in, kernel_size[1], padding[1], stride[1], dilation[1])

    # (B, C, H, W)
    x = torch.nn.functional.unfold(x, kernel_size, padding=padding, stride=stride, dilation=dilation)
    # (B, C * kernel_size[0] * kernel_size[1], h_dim_out * w_dim_out)
    x = x.view(-1, channels, kernel_size[0], kernel_size[1], h_dim_out, w_dim_out)
    # (B, C, kernel_size[0], kernel_size[1], h_dim_out, w_dim_out)
    x = x.permute(0,1,4,5,2,3)
    # (B, C, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1])
    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1])
    # (B * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1])
    return x


def combine_patches_2d(x, kernel_size, output_shape, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out

    channels = x.shape[1]
    h_dim_out, w_dim_out = output_shape[2:]
    h_dim_in = get_dim_blocks(h_dim_out, kernel_size[0], padding[0], stride[0], dilation[0])
    w_dim_in = get_dim_blocks(w_dim_out, kernel_size[1], padding[1], stride[1], dilation[1])

    # (B * h_dim_in * w_dim_in, C, kernel_size[0], kernel_size[1])
    x = x.view(-1, channels, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1])
    # (B, C, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1])
    x = x.permute(0,1,4,5,2,3)
    # (B, C, kernel_size[0], kernel_size[1], h_dim_in, w_dim_in)
    x = x.contiguous().view(-1, channels * kernel_size[0] * kernel_size[1], h_dim_in * w_dim_in)
    # (B, C * kernel_size[0] * kernel_size[1], h_dim_in * w_dim_in)
    x = torch.nn.functional.fold(x, (h_dim_out, w_dim_out), kernel_size=(kernel_size[0], kernel_size[1]), padding=padding, stride=stride, dilation=dilation)
    # (B, C, H, W)
    return x



a = torch.arange(1, 65, dtype=torch.float).view(2,2,4,4)
print(a.shape)
print(a)
b = extract_patches_2d(a, 2, padding=1, stride=2, dilation=1)
# b = extract_patches_2ds(a, 2, padding=1, stride=2)
print(b.shape)
print(b)
c = combine_patches_2d(b, 2, (2,2,4,4), padding=1, stride=2, dilation=1)
print(c.shape)
print(c)
print(torch.all(a==c))

Output (2D)

torch.Size([2, 2, 4, 4])
tensor([[[[ 1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.],
          [13., 14., 15., 16.]],

         [[17., 18., 19., 20.],
          [21., 22., 23., 24.],
          [25., 26., 27., 28.],
          [29., 30., 31., 32.]]],


        [[[33., 34., 35., 36.],
          [37., 38., 39., 40.],
          [41., 42., 43., 44.],
          [45., 46., 47., 48.]],

         [[49., 50., 51., 52.],
          [53., 54., 55., 56.],
          [57., 58., 59., 60.],
          [61., 62., 63., 64.]]]])
torch.Size([18, 2, 2, 2])
tensor([[[[ 0.,  0.],
          [ 0.,  1.]],

         [[ 0.,  0.],
          [ 2.,  3.]]],


        [[[ 0.,  0.],
          [ 4.,  0.]],

         [[ 0.,  5.],
          [ 0.,  9.]]],


        [[[ 6.,  7.],
          [10., 11.]],

         [[ 8.,  0.],
          [12.,  0.]]],


        [[[ 0., 13.],
          [ 0.,  0.]],

         [[14., 15.],
          [ 0.,  0.]]],


        [[[16.,  0.],
          [ 0.,  0.]],

         [[ 0.,  0.],
          [ 0., 17.]]],


        [[[ 0.,  0.],
          [18., 19.]],

         [[ 0.,  0.],
          [20.,  0.]]],


        [[[ 0., 21.],
          [ 0., 25.]],

         [[22., 23.],
          [26., 27.]]],


        [[[24.,  0.],
          [28.,  0.]],

         [[ 0., 29.],
          [ 0.,  0.]]],


        [[[30., 31.],
          [ 0.,  0.]],

         [[32.,  0.],
          [ 0.,  0.]]],


        [[[ 0.,  0.],
          [ 0., 33.]],

         [[ 0.,  0.],
          [34., 35.]]],


        [[[ 0.,  0.],
          [36.,  0.]],

         [[ 0., 37.],
          [ 0., 41.]]],


        [[[38., 39.],
          [42., 43.]],

         [[40.,  0.],
          [44.,  0.]]],


        [[[ 0., 45.],
          [ 0.,  0.]],

         [[46., 47.],
          [ 0.,  0.]]],


        [[[48.,  0.],
          [ 0.,  0.]],

         [[ 0.,  0.],
          [ 0., 49.]]],


        [[[ 0.,  0.],
          [50., 51.]],

         [[ 0.,  0.],
          [52.,  0.]]],


        [[[ 0., 53.],
          [ 0., 57.]],

         [[54., 55.],
          [58., 59.]]],


        [[[56.,  0.],
          [60.,  0.]],

         [[ 0., 61.],
          [ 0.,  0.]]],


        [[[62., 63.],
          [ 0.,  0.]],

         [[64.,  0.],
          [ 0.,  0.]]]])
torch.Size([2, 2, 4, 4])
tensor([[[[ 1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.],
          [13., 14., 15., 16.]],

         [[17., 18., 19., 20.],
          [21., 22., 23., 24.],
          [25., 26., 27., 28.],
          [29., 30., 31., 32.]]],


        [[[33., 34., 35., 36.],
          [37., 38., 39., 40.],
          [41., 42., 43., 44.],
          [45., 46., 47., 48.]],

         [[49., 50., 51., 52.],
          [53., 54., 55., 56.],
          [57., 58., 59., 60.],
          [61., 62., 63., 64.]]]])
tensor(True)

Second (3D):

Now it becomes interesting: We need to use 2 fold and unfold where we first apply the fold to the D dimension and leave the W and H untouched by setting kernel to 1, padding to 0, stride to 1 and dilation to 1. After we review the tensor and fold over the H and W dimensions. The unfolding happens in reverse, starting with H and W, then D.

def extract_patches_3ds(x, kernel_size, padding=0, stride=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding, padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride)

    channels = x.shape[1]

    x = torch.nn.functional.pad(x, padding)
    # (B, C, D, H, W)
    x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1]).unfold(4, kernel_size[2], stride[2])
    # (B, C, d_dim_out, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1], kernel_size[2])
    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1], kernel_size[2])
    # (B * d_dim_out * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1], kernel_size[2])
    return x

def extract_patches_3d(x, kernel_size, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out

    channels = x.shape[1]

    d_dim_in = x.shape[2]
    h_dim_in = x.shape[3]
    w_dim_in = x.shape[4]
    d_dim_out = get_dim_blocks(d_dim_in, kernel_size[0], padding[0], stride[0], dilation[0])
    h_dim_out = get_dim_blocks(h_dim_in, kernel_size[1], padding[1], stride[1], dilation[1])
    w_dim_out = get_dim_blocks(w_dim_in, kernel_size[2], padding[2], stride[2], dilation[2])
    # print(d_dim_in, h_dim_in, w_dim_in, d_dim_out, h_dim_out, w_dim_out)
    
    # (B, C, D, H, W)
    x = x.view(-1, channels, d_dim_in, h_dim_in * w_dim_in)                                                     
    # (B, C, D, H * W)

    x = torch.nn.functional.unfold(x, kernel_size=(kernel_size[0], 1), padding=(padding[0], 0), stride=(stride[0], 1), dilation=(dilation[0], 1))                   
    # (B, C * kernel_size[0], d_dim_out * H * W)

    x = x.view(-1, channels * kernel_size[0] * d_dim_out, h_dim_in, w_dim_in)                                   
    # (B, C * kernel_size[0] * d_dim_out, H, W)

    x = torch.nn.functional.unfold(x, kernel_size=(kernel_size[1], kernel_size[2]), padding=(padding[1], padding[2]), stride=(stride[1], stride[2]), dilation=(dilation[1], dilation[2]))        
    # (B, C * kernel_size[0] * d_dim_out * kernel_size[1] * kernel_size[2], h_dim_out, w_dim_out)

    x = x.view(-1, channels, kernel_size[0], d_dim_out, kernel_size[1], kernel_size[2], h_dim_out, w_dim_out)  
    # (B, C, kernel_size[0], d_dim_out, kernel_size[1], kernel_size[2], h_dim_out, w_dim_out)  

    x = x.permute(0, 1, 3, 6, 7, 2, 4, 5)
    # (B, C, d_dim_out, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1], kernel_size[2])

    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1], kernel_size[2])
    # (B * d_dim_out * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1], kernel_size[2])

    return x



def combine_patches_3d(x, kernel_size, output_shape, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out

    channels = x.shape[1]
    d_dim_out, h_dim_out, w_dim_out = output_shape[2:]
    d_dim_in = get_dim_blocks(d_dim_out, kernel_size[0], padding[0], stride[0], dilation[0])
    h_dim_in = get_dim_blocks(h_dim_out, kernel_size[1], padding[1], stride[1], dilation[1])
    w_dim_in = get_dim_blocks(w_dim_out, kernel_size[2], padding[2], stride[2], dilation[2])
    # print(d_dim_in, h_dim_in, w_dim_in, d_dim_out, h_dim_out, w_dim_out)

    x = x.view(-1, channels, d_dim_in, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1], kernel_size[2])
    # (B, C, d_dim_in, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1], kernel_size[2])

    x = x.permute(0, 1, 5, 2, 6, 7, 3, 4)
    # (B, C, kernel_size[0], d_dim_in, kernel_size[1], kernel_size[2], h_dim_in, w_dim_in)

    x = x.contiguous().view(-1, channels * kernel_size[0] * d_dim_in * kernel_size[1] * kernel_size[2], h_dim_in * w_dim_in)
    # (B, C * kernel_size[0] * d_dim_in * kernel_size[1] * kernel_size[2], h_dim_in * w_dim_in)

    x = torch.nn.functional.fold(x, output_size=(h_dim_out, w_dim_out), kernel_size=(kernel_size[1], kernel_size[2]), padding=(padding[1], padding[2]), stride=(stride[1], stride[2]), dilation=(dilation[1], dilation[2]))
    # (B, C * kernel_size[0] * d_dim_in, H, W)

    x = x.view(-1, channels * kernel_size[0], d_dim_in * h_dim_out * w_dim_out)
    # (B, C * kernel_size[0], d_dim_in * H * W)

    x = torch.nn.functional.fold(x, output_size=(d_dim_out, h_dim_out * w_dim_out), kernel_size=(kernel_size[0], 1), padding=(padding[0], 0), stride=(stride[0], 1), dilation=(dilation[0], 1))
    # (B, C, D, H * W)
    
    x = x.view(-1, channels, d_dim_out, h_dim_out, w_dim_out)
    # (B, C, D, H, W)

    return x

a = torch.arange(1, 129, dtype=torch.float).view(2,2,2,4,4)
print(a.shape)
print(a)
# b = extract_patches_3d(a, 2, padding=1, stride=2)
b = extract_patches_3ds(a, 2, padding=1, stride=2)
print(b.shape)
print(b)
c = combine_patches_3d(b, 2, (2,2,2,4,4), padding=1, stride=2)
print(c.shape)
print(c)
print(torch.all(a==c))

Output (3D)

(I had to limit the characters please look at the notebook)

Third (4D)

We add a time dimension to the 3D volume. We start the folding with just the T dimension, leaving D, H and W alone similarly to the 3D version. Then we fold over D leaving H and W. Finally we do H and W. The unfolding happens in reverse again. Hopefully by now you notice a pattern and you can add arbitrarily many dimensions and start folding one by one. The unfolding happens in reverse again.

def extract_patches_4ds(x, kernel_size, padding=0, stride=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding, padding, padding, padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride, stride)

    channels = x.shape[1]

    x = torch.nn.functional.pad(x, padding)
    # (B, C, T, D, H, W)
    x = x.unfold(2, kernel_size[0], stride[0]).unfold(3, kernel_size[1], stride[1]).unfold(4, kernel_size[2], stride[2]).unfold(5, kernel_size[3], stride[3])
    # (B, C, t_dim_out, d_dim_out, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])
    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])
    # (B * t_dim_out, d_dim_out * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])
    return x

def extract_patches_4d(x, kernel_size, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation, dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out

    channels = x.shape[1]

    t_dim_in = x.shape[2]
    d_dim_in = x.shape[3]
    h_dim_in = x.shape[4]
    w_dim_in = x.shape[5]
    t_dim_out = get_dim_blocks(t_dim_in, kernel_size[0], padding[0], stride[0], dilation[0])
    d_dim_out = get_dim_blocks(d_dim_in, kernel_size[1], padding[1], stride[1], dilation[1])
    h_dim_out = get_dim_blocks(h_dim_in, kernel_size[2], padding[2], stride[2], dilation[2])
    w_dim_out = get_dim_blocks(w_dim_in, kernel_size[3], padding[3], stride[3], dilation[3])
    # print(t_dim_in, d_dim_in, h_dim_in, w_dim_in, t_dim_out, d_dim_out, h_dim_out, w_dim_out)
    
    # (B, C, T, D, H, W)
    x = x.view(-1, channels, t_dim_in, d_dim_in * h_dim_in * w_dim_in)                                                     
    # (B, C, T, D * H * W)

    x = torch.nn.functional.unfold(x, kernel_size=(kernel_size[0], 1), padding=(padding[0], 0), stride=(stride[0], 1), dilation=(dilation[0], 1))
    # (B, C * kernel_size[0], t_dim_out * D * H * W)

    x = x.view(-1, channels * kernel_size[0] * t_dim_out, d_dim_in, h_dim_in * w_dim_in)                                   
    # (B, C * kernel_size[0] * t_dim_out, D, H * W)

    x = torch.nn.functional.unfold(x, kernel_size=(kernel_size[1], 1), padding=(padding[1], 0), stride=(stride[1], 1), dilation=(dilation[1], 1))
    # (B, C * kernel_size[0] * t_dim_out * kernel_size[1], d_dim_out * H * W)

    x = x.view(-1, channels * kernel_size[0] * t_dim_out * kernel_size[1] * d_dim_out, h_dim_in, w_dim_in)
    # (B, C * kernel_size[0] * t_dim_out * kernel_size[1] * d_dim_out, H, W)

    x = torch.nn.functional.unfold(x, kernel_size=(kernel_size[2], kernel_size[3]), padding=(padding[2], padding[3]), stride=(stride[2], stride[3]), dilation=(dilation[2], dilation[3]))        
    # (B, C * kernel_size[0] * t_dim_out * kernel_size[1] * d_dim_out * kernel_size[2] * kernel_size[3], h_dim_out * w_dim_out)

    x = x.view(-1, channels, kernel_size[0], t_dim_out, kernel_size[1], d_dim_out, kernel_size[2], kernel_size[3], h_dim_out, w_dim_out)
    # (B, C, kernel_size[0], t_dim_out, kernel_size[1], d_dim_out, kernel_size[2], kernel_size[3], h_dim_out, w_dim_out)

    x = x.permute(0, 1, 3, 5, 8, 9, 2, 4, 6, 7)
    # (B, C, t_dim_out, d_dim_out, h_dim_out, w_dim_out, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])

    x = x.contiguous().view(-1, channels, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])
    # (B * t_dim_out * d_dim_out * h_dim_out * w_dim_out, C, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])

    return x



def combine_patches_4d(x, kernel_size, output_shape, padding=0, stride=1, dilation=1):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size, kernel_size, kernel_size, kernel_size)
    if isinstance(padding, int):
        padding = (padding, padding, padding, padding)
    if isinstance(stride, int):
        stride = (stride, stride, stride, stride)
    if isinstance(dilation, int):
        dilation = (dilation, dilation, dilation, dilation)

    def get_dim_blocks(dim_in, dim_kernel_size, dim_padding = 0, dim_stride = 1, dim_dilation = 1):
        dim_out = (dim_in + 2 * dim_padding - dim_dilation * (dim_kernel_size - 1) - 1) // dim_stride + 1
        return dim_out

    channels = x.shape[1]
    t_dim_out, d_dim_out, h_dim_out, w_dim_out = output_shape[2:]
    t_dim_in = get_dim_blocks(d_dim_out, kernel_size[0], padding[0], stride[0], dilation[0])
    d_dim_in = get_dim_blocks(d_dim_out, kernel_size[1], padding[1], stride[1], dilation[1])
    h_dim_in = get_dim_blocks(h_dim_out, kernel_size[2], padding[2], stride[2], dilation[2])
    w_dim_in = get_dim_blocks(w_dim_out, kernel_size[3], padding[3], stride[3], dilation[3])
    # print(t_dim_in, d_dim_in, h_dim_in, w_dim_in, t_dim_out, d_dim_out, h_dim_out, w_dim_out)

    x = x.view(-1, channels, t_dim_in, d_dim_in, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])
    # (B, C, t_dim_in, d_dim_in, h_dim_in, w_dim_in, kernel_size[0], kernel_size[1], kernel_size[2], kernel_size[3])

    x = x.permute(0, 1, 6, 2, 7, 3, 8, 9, 4, 5)
    # (B, C, kernel_size[0], t_dim_in, kernel_size[1], d_dim_in, kernel_size[2], kernel_size[3], h_dim_in, w_dim_in)

    x = x.contiguous().view(-1, channels * kernel_size[0] * t_dim_in * kernel_size[1] * d_dim_in * kernel_size[2] * kernel_size[3], h_dim_in * w_dim_in)
    # (B, C * kernel_size[0] * t_dim_in * kernel_size[1] * d_dim_in * kernel_size[2] * kernel_size[3], h_dim_in, w_dim_in)

    x = torch.nn.functional.fold(x, output_size=(h_dim_out, w_dim_out), kernel_size=(kernel_size[2], kernel_size[3]), padding=(padding[2], padding[3]), stride=(stride[2], stride[3]), dilation=(dilation[2], dilation[3]))
    # (B, C * kernel_size[0] * t_dim_in * kernel_size[1] * d_dim_in, H, W)

    x = x.view(-1, channels * kernel_size[0] * t_dim_in * kernel_size[1], d_dim_in * h_dim_out * w_dim_out)
    # (B, C * kernel_size[0] * t_dim_in * kernel_size[1], d_dim_in * H * W)

    x = torch.nn.functional.fold(x, output_size=(d_dim_out, h_dim_out * w_dim_out), kernel_size=(kernel_size[1], 1), padding=(padding[1], 0), stride=(stride[1], 1), dilation=(dilation[1], 1))
    # (B, C * kernel_size[0] * t_dim_in, D, H * W)

    x = x.view(-1, channels * kernel_size[0], t_dim_in * d_dim_out * h_dim_out * w_dim_out)
    # (B, C * kernel_size[0], t_dim_in * D * H * W)

    x = torch.nn.functional.fold(x, output_size=(t_dim_out, d_dim_out * h_dim_out * w_dim_out), kernel_size=(kernel_size[0], 1), padding=(padding[0], 0), stride=(stride[0], 1), dilation=(dilation[0], 1))
    # (B, C, T, D * H * W)
    
    x = x.view(-1, channels, t_dim_out, d_dim_out, h_dim_out, w_dim_out)
    # (B, C, T, D, H, W)

    return x

a = torch.arange(1, 129, dtype=torch.float).view(2,2,2,2,4,2)
print(a.shape)
print(a)
# b = extract_patches_4d(a, 2, padding=1, stride=2)
b = extract_patches_4ds(a, 2, padding=1, stride=2)
print(b.shape)
print(b)
c = combine_patches_4d(b, 2, (2,2,2,2,4,2), padding=1, stride=2)
print(c.shape)
print(c)
print(torch.all(a==c))

Output (4D)

(I had to limit the characters please look at the notebook)

How to extract overlapping patches from a 3D volume and recreate the input shape from the patches?

Answers (2)

Related Questions