what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?

Question

tf.keras.layers.Dense(10, activation=tf.nn.softmax)

and

tf.keras.layers.Softmax(10)

Marco Cerliani · Accepted Answer

they are the same, you can test it on your own

# generate data
x = np.random.uniform(0,1, (5,20)).astype('float32')

# 1st option
X = Dense(10, activation=tf.nn.softmax)
A = X(x)

# 2nd option
w,b = X.get_weights()
B = Softmax()(tf.matmul(x,w) + b)

tf.reduce_all(A == B)
#

Pay attention also when using tf.keras.layers.Softmax, it doesn't require to specify the units, it's a simple activation

by default, the softmax is computed on the -1 axis, you can change this if you have tensor outputs > 2D and want to operate softmax on other dimensionalities. You can change this easily in the second option

what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?

Answers (1)

Related Questions