Binary classifier always returns 0.5

Question

I am training a classifier that takes a RGB input (so three 0 to 255 values) and returns whether black or white (0 or 1) font would fit best with that colour. After training, my classifier always returns 0.5 (or there about) and never gets any more accurate than that.

The code is below:

import tensorflow as tf
import numpy as np
from tqdm import tqdm

print('Creating Datasets:')

x_train = []
y_train = []

for i in tqdm(range(10000)):
    x_train.append([np.random.uniform(0, 255), np.random.uniform(0, 255), np.random.uniform(0, 255)])

for elem in tqdm(x_train):
    if (((elem[0] + elem[1] + elem[2]) / 3) / 255) > 0.5:
        y_train.append(0)
    else:
        y_train.append(1)

x_train = np.array(x_train)
y_train = np.array(y_train)

graph = tf.Graph()

with graph.as_default():

    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    w_1 = tf.Variable(tf.random_normal([3, 10], stddev=1.0), tf.float32)
    b_1 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_1 = tf.sigmoid(tf.matmul(x, w_1) + b_1)

    w_2 = tf.Variable(tf.random_normal([10, 10], stddev=1.0), tf.float32)
    b_2 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_2 = tf.sigmoid(tf.matmul(l_1, w_2) + b_2)

    w_3 = tf.Variable(tf.random_normal([10, 5], stddev=1.0), tf.float32)
    b_3 = tf.Variable(tf.random_normal([5]), tf.float32)
    l_3 = tf.sigmoid(tf.matmul(l_2, w_3) + b_3)

    w_4 = tf.Variable(tf.random_normal([5, 1], stddev=1.0), tf.float32)
    b_4 = tf.Variable(tf.random_normal([1]), tf.float32)
    y_ = tf.sigmoid(tf.matmul(l_3, w_4) + b_4)

    loss = tf.reduce_mean(tf.squared_difference(y, y_))

    optimizer = tf.train.AdadeltaOptimizer().minimize(loss)

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], y : y_train[index:index+128]}
            sess.run(optimizer, feed_dict=feed_dict)
            if step % 1000 == 0:
                print(sess.run([loss], feed_dict=feed_dict))

        while True:
            inp1 = int(input(''))
            inp2 = int(input(''))
            inp3 = int(input(''))
            print(sess.run(y_, feed_dict={x : [[inp1, inp2, inp3]]}))

As you can see, I start by importing the modules I will be using. Next I generate my input x dataset and desired output y dataset. The x_train dataset consists of 10000 random RGB values, while the y_train dataset consists of 0's and 1's, with a 1 corresponding to an RGB value with a mean lower than 128 and a 0 corresponding to an RGB value with a mean higher than 128 (this ensures bright backgrounds get dark font and vice versa).

My neural net is admittedly overly complex (or so i assume), but as far as I am aware it is a pretty standard feed forward net, with an Adadelta optimiser and the default learning rate.

The training of the net is normal as far as my limited knowledge informs me, but nonetheless the model always spits out 0.5.

The last block of code allows the user to input values and see what they turn into when passed to the neural net.

I have messed around with different activation functions, losses, methods of initialising biases etc. But to no avail. Some times when I tinker with the code, the model always returns 1 or 0 respectively, but this is still just as inaccurate as being indecisive and returning 0.5 over and over. I have not been able to find a suitable solution to my problem online. Any advice or suggestions are welcome.

Edit:

The loss, weights, biases and the output don't change much over the course of training (the weights and biases only change by hundredths and thousandths every 1000 iterations, and the loss fluctuates around 0.3). Also, the output sometimes varies f depending on the input (as you would expect), but other times is constant. One run of the program lead to constant 0.7's as output, while another always returned 0.5 apart from very near to zero, where it returned 0.3 or 0.4 type values. Neither of the aforementioned are the desired output. What should happen is that (255, 255, 255) should map to 0 and (0, 0, 0) should map to 1 and (128, 128, 128) should map to either 1 or 0, as in the middle the font colour doesn't really matter.

Max Weinzierl · Accepted Answer

The largest issue was that you were using mean squared error as your loss function on a classification problem. The cross-entropy loss function is much more suited for this kind of problem.

Here's a visualization of the difference between the cross-entropy loss function and the mean squared error loss function:

Source: Wolfram Alpha

Notice how the loss increases asymptotically as the model gets further from the correct prediction (in this case 1). This curvature provides a much stronger gradient signal during backpropogation while also satisfying many important theoretical probability distribution distance (divergence) properties. By minimizing the cross-entropy loss you are actually also minimizing the KL divergence between your model's prediction distribution and the training data label distribution. You can read more about the cross-entropy loss function here: http://colah.github.io/posts/2015-09-Visual-Information/

I also tweaked a few other things to make the code better and make the model easier to modify. This should solve all your problems:

import tensorflow as tf
import numpy as np
from tqdm import tqdm

# define a random seed for (somewhat) reproducible results:
seed = 0
np.random.seed(seed)
print('Creating Datasets:')

# much faster dataset creation
x_train = np.random.uniform(low=0, high=255, size=[10000, 3])
# easier label creation
# if the average color is greater than half the color space than use black, otherwise use white
# classes:
# white = 0
# black = 1
y_train = ((np.mean(x_train, axis=1) / 255.0) > 0.5).astype(int)

# now transform dataset to be within range [-1, 1] instead of [0, 255] 
# for numeric stability and quicker model training
x_train = (2 * (x_train / 255)) - 1

graph = tf.Graph()

with graph.as_default():
    # must do this within graph scope
    tf.set_random_seed(seed)
    # specify input dims for clarity
    x = tf.placeholder(tf.float32, shape=[None, 3])
    # y is now integer label [0 or 1]
    y = tf.placeholder(tf.int32, shape=[None])
    # use relu, usually better than sigmoid 
    activation_fn = tf.nn.relu
    # from https://arxiv.org/abs/1502.01852v1
    initializer = tf.initializers.variance_scaling(
        scale=2.0, 
        mode='fan_in',
        distribution='truncated_normal')
    # better api to reduce clutter
    l_1 = tf.layers.dense(
        x,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_2 = tf.layers.dense(
        l_1,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_3 = tf.layers.dense(
        l_2,
        5,
        activation=activation_fn,
        kernel_initializer=initializer)
    y_logits = tf.layers.dense(
        l_3,
        2,
        activation=None,
        kernel_initializer=initializer)

    y_ = tf.nn.softmax(y_logits)
    # much better loss function for classification
    loss = tf.reduce_mean(
        tf.losses.sparse_softmax_cross_entropy(
            labels=y, 
            logits=y_logits))
    # much better default optimizer for new problems
    # good learning rate, but probably can tune
    optimizer = tf.train.AdamOptimizer(
        learning_rate=0.01)
    # seperate train op for easier calling
    train_op = optimizer.minimize(loss)

    # tell tensorflow not to allocate all gpu memory at start
    config = tf.ConfigProto()
    config.gpu_options.allow_growth=True
    with tf.Session(config=config) as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], 
                         y : y_train[index:index+128]}
            # can train and get loss in single run, much more efficient
            _, b_loss = sess.run([train_op, loss], feed_dict=feed_dict)
            if step % 1000 == 0:
                print(b_loss)

        while True:
            inp1 = int(input('Enter R pixel color: '))
            inp2 = int(input('Enter G pixel color: '))
            inp3 = int(input('Enter B pixel color: '))
            # scale to model train range [-1, 1]
            model_input = (2 * (np.array([inp1, inp2, inp3], dtype=float) / 255.0)) - 1
            if (model_input >= -1).all() and (model_input <= 1).all():
                # y_ is now two probabilities (white_prob, black_prob) but they will sum to 1.
                white_prob, black_prob = sess.run(y_, feed_dict={x : [model_input]})[0]
                print('White prob: {:.2f} Black prob: {:.2f}'.format(white_prob, black_prob))
            else:
                print('Values not within [0, 255]!')

I documented my changes with comments, but let me know if you have any questions! I ran this on my end and it worked perfectly:

Creating Datasets:
2018-10-05 00:50:59.156822: I T:\src\github	ensorflow	ensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-10-05 00:50:59.411003: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB
2018-10-05 00:50:59.417736: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-05 00:51:00.109351: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-05 00:51:00.113660: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:971]      0
2018-10-05 00:51:00.118545: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0:   N
2018-10-05 00:51:00.121605: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6370 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Training:
  0%|                                                                                         | 0/5000 [00:00

Binary classifier always returns 0.5

Answers (2)

Related Questions