arcoxia tom
arcoxia tom

Reputation: 1711

Tensor MNIST tutorial - cross_entropy calculation

I'm following this tutorial for tensorflow:

It describes the implementation of the cross entropy function as:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=1 parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch.

It is my understanding that both the actual and predicted values of y, from reading the tutorial, are 2D tensors. The rows are the number of MNIST vectors that you use of size 784 which represents the columns.

The quote above says that "we multiply each element of y_ with the corresponding element of tf.log(y)".

My question is - are we doing traditional matrix multiplication here i.e row x column because the sentence suggests that we are not?

Upvotes: 0

Views: 326

Answers (2)

Ekaba Bisong
Ekaba Bisong

Reputation: 2982

The traditional matrix multiplication is only used when calculating the model hypothesis as seen in the code to multiply x by W:

y = tf.nn.softmax(tf.matmul(x, W) + b)

The code y_ * tf.log(y) in the code block:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),
                                                  reduction_indices=[1]))

performs an element-wise multiplication of the original targets => y_ with the log of the predicted targets => y.

The goal of calculating the cross-entropy loss function is to find the probability that an observation belongs to a particular class or group in the classification problem.

It is this measure (i.e., the cross-entropy loss) that is minimized by the optimization function of which Gradient Descent is a popular example to find the best set of parameters for W that will improve the performance of the classifier. We say the loss is minimized because the lower the loss or cost of error, the better the model.

Upvotes: 1

dgumo
dgumo

Reputation: 1878

We are doing element wise multiplication here: y_ * tf.log(y)

Upvotes: 0

Related Questions