Reputation: 5565
I've implemented a basic neural network from scratch using Tensorflow and trained it on MNIST fashion dataset. It's trained correctly and outputs testing accuracy around ~88-90%
over 10 classes.
Now I've written predict()
function which predicts the class of given image using trained weights. Here is the code:
def predict(images, trained_parameters):
Ws, bs = [], []
parameters = {}
for param in trained_parameters.keys():
parameters[param] = tf.convert_to_tensor(trained_parameters[param])
X = tf.placeholder(tf.float32, [images.shape[0], None], name = 'X')
Z_L = forward_propagation(X, trained_parameters)
p = tf.argmax(Z_L) # Working fine
# p = tf.argmax(tf.nn.softmax(Z_L)) # not working if softmax is applied
with tf.Session() as session:
prediction = session.run(p, feed_dict={X: images})
return prediction
This uses forward_propagation()
function which returns the weighted sum of the last layer (Z
) and not the activitions (A
) because of TensorFlows tf.nn.softmax_cross_entropy_with_logits()
requires Z
instead of A
as it will calculate A
by applying softmax Refer this link for details.
Now in predict()
function, when I make predictions using Z
instead of A
(activations) it's working correctly. By if I calculate softmax on Z
(which is activations A
of the last layer) it's giving incorrect predictions.
Why it's giving correct predictions on weighted sums Z
? We are not supposed to first apply softmax activation (and calculate A
) and then make predictions?
Here is the link to my colab notebook if anyone wants to look at my entire code: Link to Notebook Gist
So what am I missing here?
Upvotes: 2
Views: 766
Reputation: 11895
Most TF functions, such as tf.nn.softmax, assume by default that the batch dimension is the first one - that is a common practice. Now, I noticed in your code that your batch dimension is the second, i.e. your output shape is (output_dim=10, batch_size=?)
, and as a result, tf.nn.softmax
is computing the softmax activation along the batch dimension.
There is nothing wrong in not following the conventions - one just needs to be aware of them. Computing the argmax of the softmax along the first axis should yield the desired results (it is equivalent to taking the argmax of the logits):
p = tf.argmax(tf.nn.softmax(Z_L, axis=0))
Also, I would also recommend computing the argmax along the first axis in case more than one image is fed into the network.
Upvotes: 2