Tensorflow: loss value is inconsistent with accuracy

Question

I am building a simple one-hidden-layer neural network with Tensorflow.

For the inputs, every row of data corresponds to 10 answers. The first 2 elements of each row are correct, i.e. the same to the ground truth labels. In contrast, the last 8 elements are opposite to the ground truth label.

For example,

[1, 1, 0, 0, 0, 0, 0, 0, 0, 0], correct is 1
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1], correct is 0
[0, 0, 1, 1, 1, 1, 1, 1, 1, 1], correct is 0
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0], correct is 1

I would like my neural network to learn that the first two elements/features always give the correct results. Therefore, I wish the network to give bigger weights to the first two features. However, the network will always get stuck at some loss value.

More interestingly, the accuracy is taken as the proportion of labels that are correctly predicted out of the total number of labels. The loss function is calculated with the sigmoid function, i.e., $y * log(logit) + (1-y) * log(1-logit))$. Sometimes, as the loss decreased, the accuracy increased. e.g.,

epoch is:  0 loss is:  7.661093  accuracy value is:  1.0 
epoch is:  100 loss is:  7.579134  accuracy value is:  0.54545456 
epoch is:  200 loss is:  7.5791006  accuracy value is:  0.54545456

I thought the network could keep increasing the weights of the first two elements until it can completely predict the correct label.

Can anyone please tell me what should I do to facilitate the network correctly predicting the label, instead of getting stuck?

My code is here:

import tensorflow as tf
import numpy as np


class SigmoidNeuralNetwork():
    def __init__(self, learning_rate, training_data, correct_labels, epoch_number):
        self.learning_rate = learning_rate
        self.training_data = training_data
        self.correct_labels = correct_labels

        self.X = tf.placeholder(tf.float32)
        self.y = tf.placeholder(tf.float32)

        self.feature_num = len(self.training_data[0])
        self.sample_num = len(self.training_data)

        self.W = tf.Variable(tf.random_uniform([self.feature_num, 1], -1.0, 1.0), dtype=tf.float32)
        self.b = tf.Variable([0.0])

        self.epoch_number = epoch_number

    def launch_network(self):
        db = tf.matmul(self.X, tf.reshape(self.W, [-1, 1])) + self.b
        hyp = tf.sigmoid(db)

        cost0 = self.y * tf.log(tf.clip_by_value(hyp, 1e-10, 1.0))
        cost1 = (1 - self.y) * tf.log(tf.clip_by_value((1 - hyp), 1e-10, 1.0))
        cost = (cost0 + cost1) / float(self.sample_num)
        loss = -tf.reduce_sum(cost)

        optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)
        train = optimizer.minimize(loss)

        #
        new_train_X = self.training_data.astype(np.float32)

        output = tf.add(tf.matmul(new_train_X, self.W), self.b)
        prediction = tf.sigmoid(output)

        predicted_class = tf.greater(prediction, 0.5)
        ground_labels = tf.reshape(tf.equal(self.y, 1.0), predicted_class.shape)
        correct = tf.equal(predicted_class, ground_labels)
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
        #

        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)

        for epoch in range(self.epoch_number):
            _, loss_val, accuracy_val = sess.run([train, loss, accuracy], {self.X: self.training_data, self.y: self.correct_labels})

            if epoch % 100 == 0:
                print "epoch is: ", epoch, "loss is: ", loss_val, " accuracy value is: ", accuracy_val
                # print "weight is: ", sess.run(self.W).flatten()


train_data = np.array([
    [1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
    [1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
])

correct_answers = np.array([1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1])

sigmoid_network = SigmoidNeuralNetwork(learning_rate=0.01, training_data=train_data, correct_labels=correct_answers,
                                       epoch_number=10000)

sigmoid_network.launch_network()

ChoF · Accepted Answer

What is problem?

OP wrote:

I thought the network could keep increasing the weights of the first two elements until it can completely predict the correct label.

You are completely right.

Can anyone please tell me what should I do to facilitate the network correctly predicting the label, instead of getting stuck?

The problem is in the function launch_network():

def launch_network(self):
    db = tf.matmul(self.X, tf.reshape(self.W, [-1, 1])) + self.b
    hyp = tf.sigmoid(db)

    cost0 = self.y * tf.log(tf.clip_by_value(hyp, 1e-10, 1.0))
    ... (skip) ...

Note that db and hyp have the same shape (self.sample_num, 1) (2-dim), but the shape of self.y (that is correct_answers) is (self.sample_num,) (1-dim).

At the 5th line to get cost0, you multiplied as self.y * tf.log(...hyp...). So the shape of the result became (self.sample_num, self.sample_num), not (self.sample_num, 1).

Suggestion for a solution

The simplest solution is changing the shape of correct_answers into (self.sample_num, 1) (2-dim), not (self.sample_num,) (1-dim) as follows:

correct_answers = np.array([1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1])[:,np.newaxis]

Tensorflow: loss value is inconsistent with accuracy

Answers (1)

What is problem?

Suggestion for a solution

Related Questions