Reputation: 24129
I came across this PyTorch example for depthwise separable convolutions using the groups
parameter:
class depthwise_separable_conv(nn.Module):
def __init__(self, nin, nout):
super(depthwise_separable_conv, self).__init__()
self.depthwise = nn.Conv2d(nin, nin, kernel_size=3, padding=1, groups=nin)
self.pointwise = nn.Conv2d(nin, nout, kernel_size=1)
def forward(self, x):
out = self.depthwise(x)
out = self.pointwise(out)
return out
I haven't seen any usage of groups in CNNs before. The documentation is also a bit sparse as far as that is concerned:
groups
controls the connections between inputs and outputs.in_channels
andout_channels
must both be divisible by groups.
So my questions are:
(I guess this is more a general, not PyTorch specific.)
Upvotes: 3
Views: 5369
Reputation: 13113
Perhaps you're looking up an older version of the docs. 1.0.1 documentation for nn.Conv2d
expands on this.
Groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,
At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups= in_channels, each input channel is convolved with its own set of filters, of size: (
floor(c_out / c_in)
)
If you prefer a more mathematical description, start by thinking of a 1x1
convolution with groups=1
(default). It is a essentially a full matrix applied across all channels f
at each (h, w)
location. Setting groups
to higher values turns this matrix into a diagonal block-sparse matrix with the number of blocks equal to groups
. With groups=in_channels
you get a diagonal matrix.
Now, if the kernel is larger than 1x1
, you retain the channel-wise block-sparsity as above, but allow for larger spatial kernels. I suggest rereading the groups=2
exempt from the docs I quoted above, it describes exactly that scenario in yet another way, perhaps helpful for understanding. Hope this helps.
Edit: Why does anybody want to use it? Either as a constraint (prior) for the model or as a performance improvement technique; sometimes both. In the linked thread the idea is to replace a NxN, groups=1
2d conv with a sequence of NxN, groups=n_features
-> 1x1, groups=1
convolutions. This mathematically results in a single convolution (since a convolution of a convolution is still a convolution), but makes the "product" convolution matrix more sparse and thus reduces the number of parameters and computational complexity. This seems to be a reasonable resource explaining this more in-depth.
Upvotes: 8