Reputation: 528
I was trying a very simple example of classification with Tensorflow. Instead of using one-hot vector, tf.nn.softmax, and crossentropy loss with logits. I wanted to use the discrete case of 0/1 labels. Where the output of the NN model would be 0 or 1. Hence i did somehing like this y_ = tf.nn.sigmoid(tf.matmul(hidden, weight2) + bias2) y_ = tf.cast(tf.greaterequal(y, 0.5), tf.float32) so this would give tensor of 0 or 1. But when i try to train this gives me error saying that No Gradient Provided. Here is the full code. https://gist.github.com/kris-singh/54aecbc1d61f1d7d79a43ae2bfac8516 My question what i am trying to do is it possible in tf or not? if yes how ?
Upvotes: 0
Views: 984
Reputation: 866
You can absolutely train the network, but you need to remove the casting operator. Having the sigmoid there allows the network to backpropagate errors from the classification training examples. If you want to binarize the predictions coming out of the predictor for analysis of accuracy, you can absolutely do so but not as an integrated part of the network architecture.
This approach is pretty common in fact - for multi-class architectures, the softmax layer produces a probability vector which is what the network trains on. When using it to predict classes though, frequently you'll see people take the probabilistic vector of outputs and force it to a one-hot vector (or just grab the index of the max predictor using argmax). But for back propagation to work it has to be able to calculate gradients of the error at the output, which precludes using rounding (or any other such step functions) as an integral part of the network.
Upvotes: 0