Reputation: 21
The discrete convolution is by definition associative. But when I try to verify this in pytorch, I can not find get a plausible result.
The associative law is $f*(g*\psi)=(f * g)*\psi$, so I create three discrete functions centered at zero(as tensors) and convolve them with proper zero paddings so that all non-zero element in result map is obtained.
import torch
import torch.nn as nn
def test_conv_compst():
# $\psi$
inputs = torch.randn((1,4,7,7))
# $g$
a = torch.randn((7, 4, 3, 3))
# $f$
b = torch.randn((3, 7, 3, 3))
int_1 = torch.conv2d(inputs, a, padding=2)
# results obtained by the first order
res_1 = torch.conv2d(int_1, b, padding=2)
comp_k = torch.conv2d(a.transpose(1, 0), b, padding=2).transpose(1, 0)
print(comp_k.shape)
# results obtained through the second order
res_2 = torch.conv2d(inputs, comp_k, padding=4)
print(res_1.shape)
print(res_2.shape)
print(torch.max(torch.abs(res_2-res_1)))
The expected result is that the difference from the two results is negligible. But it returns:
torch.Size([3, 4, 5, 5])
torch.Size([1, 3, 11, 11])
torch.Size([1, 3, 11, 11])
tensor(164.8044)
Upvotes: 2
Views: 801
Reputation: 13113
Long story short, this is because of batching. The first argument of torch.conv2d
is interpreted as [batch, channel, height, width]
, the second as [out_channel, in_channel, height, width]
and the output as [batch, channel, height, width]
. So if you call conv2d(a, conv2d(b, c))
, you treat b
's leading dimension as batch and if you call conv2d(conv2d(a, b), c)
, you treat it as out_channels
.
That being said, I get the impression that you're asking about math here, so let me expand. Your idea is correct in theory: convolutions are linear operators and should be associative. However, since we provide them with kernels rather than the actual matrices representing the linear operators, there is some "conversion" that needs to happen behind the scenes so that the kernels are interpreted as matrices properly. Classically, this can be done by constructing the corresponding circulant matrices (border conditions aside). If we denote the kernels with a
, b
, c
and the circulant matrix creation operator with M
, we get that M(a) @ [M(b) @ M(c)] = [M(a) @ M(b)] @ M(c)
, where @
denotes matrix-matrix multiplication.
Convolution implementations return an image (vector, kernel, however you call it) and not the associated circulant matrix, which is ridiculously redundant and wouldn't fit in the memory in most cases. Therefore we also need some circulant-to-vector operator V(matrix)
, which returns the first column of matrix
and is therefore the inverse of M
. In abstract mathematical terms, functions such as scipy.signal.convolve
(actually correlate
, since convolution requires an extra flip of one of the inputs, which I skip for clarity) are implemented as convolve = lambda a, b: V(M(a) @ M(b))
and thus
convolve(a, convolve(b, c)) =
= V(M(a) @ M(V[M(b) @ M(c)])
= V(M(a) @ M(b) @ M(c))
= V(M(V[M(a) @ M(b)]) @ M(c))
= convolve(convolve(a, b), c)
I hope I haven't lost you, this is just converting one into the other by making use of the fact that V
is the inverse of M
and of the associativeness of matrix multiplication to move the parentheses. Note that the middle line is basically the "raw" ABC
. We can verify with the following code:
import numpy as np
import scipy.signal as sig
c2d = sig.convolve2d
a = np.random.randn(7, 7)
b = np.random.randn(3, 3)
c = np.random.randn(3, 3)
ab = c2d(a, b)
ab_c = c2d(ab, c)
bc = c2d(b, c)
a_bc = c2d(a, bc)
print((a_bc - ab_c).max())
The problem with PyTorch is that it interprets the first input as [batch, channel, height, width]
and the second as [out_channels, in_channels, height, width]
. This means that the "conversion" operator M
is different for the first argument and the second argument. Let's call them M
and N
, respectively. Since there is only one output, there is only one V
and it can be the inverse of either M
or N
, but not both (since they are different). If you rewrite the above equation taking care to distinguish between M
and N
you will see that, depending on your choice whether V
inverts one or the other, you're unable to write the equality either between lines 2 and 3 or 3 and 4.
In practice, there also the additional issue of the channel
dimension, which is not there in the classic definition of convolutions, however my first guess is that it could be dealt with with a single lifting operator M
for both operands, unlike batching.
Upvotes: 1