Krishnan R.
Krishnan R.

Reputation: 66

Tensorflow - Neural Network always predicting the same thing

Hi All,

I have been trying to make a neural network that classifies salaries based on certain features. However, when I run my tensorflow code of this neural net, it predicts the same thing no matter what features I put in. I have read on neural network concepts and the like, and my code checks out with my conceptual knowledge, so I am confused on what I am doing wrong. Please do explain what you find thoroughly, as I am still very ignorant in this area.

This is my code:

import tensorflow as tf
import numpy as np

n_inputs = 4
n_hidden1 = 2
n_hidden2 = 2
n_outputs = 1000000

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")  
y = tf.placeholder(tf.int64, shape=(None), name="y")  

with tf.name_scope("dnn"):


    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
                              activation=tf.nn.relu)  

    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
                              activation=tf.nn.relu)

    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")  

learning_rate = 0.1  

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)  



init = tf.global_variables_initializer()  

saver = tf.train.Saver()

# Training data. In every 1-D array, the first 4 elements are features and the last element is a label/output.
train_x = [[11, 3, 2, 4, 150000], [9, 2, 1, 2, 90000], [10, 4, 3, 1, 140000], [11, 3, 4, 4, 170000],
           [8, 2, 1, 3, 105000], [7, 2, 1, 2, 95000], [11, 4, 2, 4, 145000], [10, 4, 1, 4, 110000],
           [9, 3, 4, 4, 160000], [8, 2, 3, 4, 145000], [7, 4, 2, 4, 130000], [8, 2, 1, 2, 101000],
           [10, 2, 2, 3, 130000], [10, 3, 3, 3, 140000], [8, 3, 1, 2, 105000], [7, 4, 1, 3, 95000],
           [10, 3, 4, 3, 165000], [10, 3, 4, 4, 167000], [10, 4, 4, 1, 166000], [8, 4, 2, 4, 137000],
           [9, 2, 2, 4, 140000], [8, 2, 2, 2, 142000], [9, 2, 2, 3, 143000], [9, 2, 2, 4, 144000], [8, 4, 2, 2, 140000],
           [6, 4, 1, 4, 110000], [7, 3, 1, 2, 100000], [8, 3, 1, 3, 101000], [7, 2, 1, 3, 100000], [7, 2, 1, 3, 950000],
           [7, 4, 1, 4, 980000], [8, 4, 1, 4, 100000], [8, 3, 1, 4, 100000], [9, 3, 1, 2, 101000], [8, 3, 1, 2, 107000],
           [8, 3, 2, 2, 110000], [8, 2, 2, 3, 115000], [7, 4, 2, 2, 112000], [8, 2, 2, 4, 120000], [8, 4, 2, 4, 122000],
           [8, 2, 2, 3, 120000], [8, 3, 2, 4, 123000], [8, 3, 2, 4, 121000], [8, 2, 2, 4, 121000], [8, 4, 2, 2, 120000]]

with tf.Session() as sess:
    init.run()  

#Training
    for i in range(0, 45):  

        X_batch = [train_x[i][:4]]
        y_batch = train_x[i][4:]

        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

# Testing
    for i in range(0, 45):
        pred_data = logits.eval(feed_dict={X: [train_x[i][:4]]})
        pred = np.argmax(pred_data, axis=1)
        print("Predicted Value : ", pred, " Expected Value  :", train_x[i][4:])

This is what the predictions turn out looking like:

Predicted Value :  [140000]  Expected Value  : [150000]
Predicted Value :  [140000]  Expected Value  : [90000]
Predicted Value :  [140000]  Expected Value  : [140000]
Predicted Value :  [140000]  Expected Value  : [170000]
Predicted Value :  [140000]  Expected Value  : [105000]
Predicted Value :  [140000]  Expected Value  : [95000]
Predicted Value :  [140000]  Expected Value  : [145000]
Predicted Value :  [140000]  Expected Value  : [110000]
Predicted Value :  [140000]  Expected Value  : [160000]
Predicted Value :  [140000]  Expected Value  : [145000]
Predicted Value :  [140000]  Expected Value  : [130000]
Predicted Value :  [140000]  Expected Value  : [101000]
...

I have tried basic normalization,changing learning rates,etc. from other posts and questions, but have gotten nowhere.

Thank you for your help.

Upvotes: 1

Views: 737

Answers (1)

acattle
acattle

Reputation: 3113

I think the problem is you're treating this regression problem as a classification problem. Instead of trying to predict the number of dollars in the salary directly, you seem to be generating a 1,000,000 length vector and then selecting the index with the largest value.

There's four problems with this approach. First, you're trying to train (4 x 2) + (2 x 2) + (2 x 1,000,000) = 2,000,012 edge weights with only 45 examples. That's not nearly enough.

Second, assuming you DO want to treat this as a classification problem, your inputted y is an integer while your output is a 1,000,000 length vector. I don't see where or even if you convert this integer to a one-hot vector of length 1,000,000 so that the input and output are even comparable.

Third, for multiclass classification problems where labels are mutually exclusive (i.e. someone's salary can't be $15,000 AND $18,000 at the same time), the standard procedure is to give the output a softmax activation function. The net effect is that during training the network learns to have only 1 output node with a value near 1 and near 0 for everything else.

Fourth, by treating salary prediction as a classification problem, the network treats a predicted a salary that is $1 off the expected value as being just as bad as a predicted salary that's $10,000 off. This is obviously not true. Instead of training a 1,000,000 node output, try training a single node (with a relu activation to avoid negative salaries). Then take the value of the output node as the predicted salary instead of the argmax.

Upvotes: 4

Related Questions