Kev1n91
Kev1n91

Reputation: 3693

Why is the most time spent in the fully connected layers despite its complexity is less than the conv-layers?

When doing benchmarks of CNNs I found out that the most time is spent in the fully-connected layers. But when it comes to calculate the computational complexity I found out that:

O(conv) = N*(D * (W+P) * (H+P) *  h *w)/S
O(fully_connected) = D*W*H*N

Where

D = Dimensions Input 
W,w = Width Input, width Filter
H, h = Height Input, height Filter
S = Stride
P = Padding
N = number of outputs

For an example, I have a 1024x11x11 feature map input DxWxH, a 5x5 filter h,w without padding p, and with the Stride S of 1, and the number of outputs N shall be 512

This results in the following calculation for the convolution:

O(conv) = 512*(1024*11*11*5*5)/1 = 1 585 971 200

If the same input is used for a fully connected layer, and the desired output is still 512 then:

O(fully_connected) = 512*1024*11*11 = 63 438 848

Is this due to the more advanced methods for parallesing the convolutional layers on a GPU and the conv layer has more operations but less computation time cause of parallism issues? Or is my way of calculating the complexity of each layers simply wrong?

Upvotes: 5

Views: 1806

Answers (1)

Martin Thoma
Martin Thoma

Reputation: 136625

You can check if it is only the implementation by converting the fully-connected connections to equivalent convolutions. For every fully connected layer, there is an equivalent convolutional layer (see my question for details and examples).

  1. You have c channels of size w × h (hence the shape c × w × h) followed by a fully-connected layer with n nodes.
  2. Add a reshape layer after the channels to get (c ⋅ w ⋅ h) × 1 × 1.
  3. Add a convolutional layer with n filters of size 1 × 1.

Now check the time. If it is faster than the fully connected layer, then it is simply due to a better implementation of convolution.

Upvotes: 3

Related Questions