Reputation: 313
i haven't found a calculation of parameters (weights + biases) of AlexNet so I tried to calculate it, but I'm not sure if its correct:
conv1: (11*11)*3*96 + 96 = 34944
conv2: (5*5)*96*256 + 256 = 614656
conv3: (3*3)*256*384 + 384 = 885120
conv4: (3*3)*384*384 + 384 = 1327488
conv5: (3*3)*384*256 + 256 = 884992
fc1: (6*6)*256*4096 + 4096 = 37752832
fc2: 4096*4096 + 4096 = 16781312
fc3: 4096*1000 + 1000 = 4097000
this results in a total amount of 62378344 parameters. Is that calculation right?
Upvotes: 21
Views: 21241
Reputation: 11
In the original paper, it says "The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 * 5 * 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 * 3 * 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3 * 3 * 192 , and the fifth convolutional layer has 256 kernels of size 3 * 3 * 192. The fully-connected layers have 4096 neurons each."
Therefore, the calculation of conv2, conv4 and conv5 should be:
conv2: (5 * 5 * 48) * 256 + 256
conv4: (3 * 3 * 192) * 384 + 384
conv4: (3 * 3 * 192) * 384 + 384
Upvotes: 1
Reputation: 2927
Your calculations are correct. We came up with the exact same number independently while writing this blog post. I have also added the final table from the post
Upvotes: 10
Reputation: 641
According to the diagram in their paper, some of the layers use grouping. Therefore, not all features of one layer communicate with the next. This means e.g. for conv2, you should have only (5*5)*48*256 + 256 = 307,456 features.
I'm not sure if all newer implementations include the grouping. It was an optimization they used to let the network train in parallel on two GPUs, but modern GPUs have more resources for training and fit the network comfortably without grouping.
Upvotes: 2
Reputation: 1323
Slide 8 in this presentation states it has 60M parameters, so I think you're at least in the ball park. http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf
Upvotes: 4