Reputation: 2197
I was wondering what was the difference between Activation Layer and Dense layer in Keras.
Since Activation Layer seems to be a fully connected layer, and Dense have a parameter to pass an activation function, what is the best practice ?
Let's imagine a fictionnal network like this : Input -> Dense -> Dropout -> Final Layer Final Layer should be : Dense(activation=softmax) or Activation(softmax) ? What is the cleanest and why ?
Thanks everyone!
Upvotes: 32
Views: 13766
Reputation: 9336
As @MarcinMożejko said, it is equivalent. I just want to explain why. If you look at the Dense
Keras documentation page, you'll see that the default activation function is None
.
A dense layer mathematically is:
a = g(W.T*a_prev+b)
where g
an activation function. When using Dense(units=k, activation=softmax)
, it is computing all the quantities in one shot. When doing Dense(units=k)
and then Activation('softmax), it first calculates the quantity, W.T*a_prev+b
(because the default activation function is None
) and then applying the activation function specified as input to the Activation
layer to the calculated quantity.
Upvotes: 1
Reputation: 40516
Using Dense(activation=softmax)
is computationally equivalent to first add Dense
and then add Activation(softmax)
. However there is one advantage of the second approach - you could retrieve the outputs of the last layer (before activation) out of such defined model. In the first approach - it's impossible.
Upvotes: 47