Reputation: 43
I was just going through the TensorFlow
tutorial (https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html#deep-mnist-for-experts).
I have two questions about it:
Why does it use the cost function with y_ * log(y)
? Shouldn't it be y_ * log(y) + (1-y_) * log(1-y)
?
How does TensorFlow
know how to calculate the gradient
of the cost function
I use? Shouldn't we have a place somewhere to tell TensorFlow
how to calculate the gradient
?
Thanks!
Upvotes: 4
Views: 1379
Reputation: 8536
When y = 1 or 0, you can use y_ * log(y) + (1-y_) * log(1-y), but when y is one-hot encoding, y=[0 1] or [1 0], we use y_ * log(y). In fact, they are the same.
Everything is a graph in TensorFlow including your cost function.
So each node knows their operation and local gradient. Tensorflow uses backpropagation (chain rule) to compute the gradient using the graph.
Upvotes: 5