MBT
MBT

Reputation: 24129

Groups in Convolutional Neural Network / CNN

I came across this PyTorch example for depthwise separable convolutions using the groups parameter:

class depthwise_separable_conv(nn.Module):
    def __init__(self, nin, nout):
        super(depthwise_separable_conv, self).__init__()
        self.depthwise = nn.Conv2d(nin, nin, kernel_size=3, padding=1, groups=nin)
        self.pointwise = nn.Conv2d(nin, nout, kernel_size=1)

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

I haven't seen any usage of groups in CNNs before. The documentation is also a bit sparse as far as that is concerned:

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups.

So my questions are:

(I guess this is more a general, not PyTorch specific.)

Upvotes: 3

Views: 5369

Answers (1)

Jatentaki
Jatentaki

Reputation: 13113

Perhaps you're looking up an older version of the docs. 1.0.1 documentation for nn.Conv2d expands on this.

Groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

At groups=1, all inputs are convolved to all outputs.

At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

At groups= in_channels, each input channel is convolved with its own set of filters, of size: (floor(c_out / c_in))

If you prefer a more mathematical description, start by thinking of a 1x1 convolution with groups=1 (default). It is a essentially a full matrix applied across all channels f at each (h, w) location. Setting groups to higher values turns this matrix into a diagonal block-sparse matrix with the number of blocks equal to groups. With groups=in_channels you get a diagonal matrix.

Now, if the kernel is larger than 1x1, you retain the channel-wise block-sparsity as above, but allow for larger spatial kernels. I suggest rereading the groups=2 exempt from the docs I quoted above, it describes exactly that scenario in yet another way, perhaps helpful for understanding. Hope this helps.

Edit: Why does anybody want to use it? Either as a constraint (prior) for the model or as a performance improvement technique; sometimes both. In the linked thread the idea is to replace a NxN, groups=1 2d conv with a sequence of NxN, groups=n_features -> 1x1, groups=1 convolutions. This mathematically results in a single convolution (since a convolution of a convolution is still a convolution), but makes the "product" convolution matrix more sparse and thus reduces the number of parameters and computational complexity. This seems to be a reasonable resource explaining this more in-depth.

Upvotes: 8

Related Questions