Tobi
Tobi

Reputation: 313

How to calculate the number of parameters of AlexNet?

i haven't found a calculation of parameters (weights + biases) of AlexNet so I tried to calculate it, but I'm not sure if its correct:

conv1: (11*11)*3*96 + 96 = 34944

conv2: (5*5)*96*256 + 256 = 614656

conv3: (3*3)*256*384 + 384 = 885120

conv4: (3*3)*384*384 + 384 = 1327488

conv5: (3*3)*384*256 + 256 = 884992

fc1: (6*6)*256*4096 + 4096 = 37752832

fc2: 4096*4096 + 4096 = 16781312

fc3: 4096*1000 + 1000 = 4097000

this results in a total amount of 62378344 parameters. Is that calculation right?

Upvotes: 21

Views: 21241

Answers (4)

Zhimin Shao
Zhimin Shao

Reputation: 11

In the original paper, it says "The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 * 5 * 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 * 3 * 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3 * 3 * 192 , and the fifth convolutional layer has 256 kernels of size 3 * 3 * 192. The fully-connected layers have 4096 neurons each."

Therefore, the calculation of conv2, conv4 and conv5 should be:

conv2: (5 * 5 * 48) * 256 + 256

conv4: (3 * 3 * 192) * 384 + 384

conv4: (3 * 3 * 192) * 384 + 384

Upvotes: 1

Satya Mallick
Satya Mallick

Reputation: 2927

Your calculations are correct. We came up with the exact same number independently while writing this blog post. I have also added the final table from the post

enter image description here

Upvotes: 10

Sven Zwei
Sven Zwei

Reputation: 641

According to the diagram in their paper, some of the layers use grouping. Therefore, not all features of one layer communicate with the next. This means e.g. for conv2, you should have only (5*5)*48*256 + 256 = 307,456 features.

I'm not sure if all newer implementations include the grouping. It was an optimization they used to let the network train in parallel on two GPUs, but modern GPUs have more resources for training and fit the network comfortably without grouping.

Upvotes: 2

Alex Klibisz
Alex Klibisz

Reputation: 1323

Slide 8 in this presentation states it has 60M parameters, so I think you're at least in the ball park. http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

Upvotes: 4

Related Questions