Reputation: 322
what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
and
tf.keras.layers.Softmax(10)
Upvotes: 8
Views: 1801
Reputation: 22031
they are the same, you can test it on your own
# generate data
x = np.random.uniform(0,1, (5,20)).astype('float32')
# 1st option
X = Dense(10, activation=tf.nn.softmax)
A = X(x)
# 2nd option
w,b = X.get_weights()
B = Softmax()(tf.matmul(x,w) + b)
tf.reduce_all(A == B)
# <tf.Tensor: shape=(), dtype=bool, numpy=True>
Pay attention also when using tf.keras.layers.Softmax
, it doesn't require to specify the units, it's a simple activation
by default, the softmax is computed on the -1 axis, you can change this if you have tensor outputs > 2D and want to operate softmax on other dimensionalities. You can change this easily in the second option
Upvotes: 6