Reputation: 3693
When doing benchmarks of CNNs I found out that the most time is spent in the fully-connected layers. But when it comes to calculate the computational complexity I found out that:
O(conv) = N*(D * (W+P) * (H+P) * h *w)/S
O(fully_connected) = D*W*H*N
Where
D = Dimensions Input
W,w = Width Input, width Filter
H, h = Height Input, height Filter
S = Stride
P = Padding
N = number of outputs
For an example, I have a 1024x11x11 feature map input DxWxH
, a 5x5 filter h,w
without padding p
, and with the Stride S of 1
, and the number of outputs N shall be 512
This results in the following calculation for the convolution:
O(conv) = 512*(1024*11*11*5*5)/1 = 1 585 971 200
If the same input is used for a fully connected layer, and the desired output is still 512 then:
O(fully_connected) = 512*1024*11*11 = 63 438 848
Is this due to the more advanced methods for parallesing the convolutional layers on a GPU and the conv layer has more operations but less computation time cause of parallism issues? Or is my way of calculating the complexity of each layers simply wrong?
Upvotes: 5
Views: 1806
Reputation: 136625
You can check if it is only the implementation by converting the fully-connected connections to equivalent convolutions. For every fully connected layer, there is an equivalent convolutional layer (see my question for details and examples).
c
channels of size w × h
(hence the shape c × w × h
) followed by a fully-connected layer with n
nodes.(c ⋅ w ⋅ h) × 1 × 1
.n
filters of size 1 × 1
.Now check the time. If it is faster than the fully connected layer, then it is simply due to a better implementation of convolution.
Upvotes: 3