kentwait
kentwait

Reputation: 2071

Mean of minibatch cross-entropy to optimize in tensorflow

I tried to follow along Martin Gorner's lecture on using TensorFlow and also the tutorial at the official TensorFlow documentation.

I'm confused why on Gorner's lecture, he used the negative sum of the dot product between the labels and the predictions. But in the TensorFlow tutorial, it uses the same method, but then divides it to get the mean for each minibatch.

Basically both will work as long as you scale the learning rate, but I don't understand the reason for the difference in methods.

Upvotes: 0

Views: 245

Answers (2)

lballes
lballes

Reputation: 1502

Using the mean instead of the sum makes the magnitude of the objective function invariant to the choice of mini-batch size. Hence, when you decide to change the mini-batch size, you can expect the same learning rate as before to still work well.

The same holds for other hyper-parameters, e.g., the L2 regularization factor.

Upvotes: 2

mhbashari
mhbashari

Reputation: 482

It seems that the mean can control the very different variables that its scale is very big. When you are using the sum, there no guaranty for harmonic scales fo variables. But with mean, you are sure that there is no very different variable.

Upvotes: 0

Related Questions