Loss doesn't get below ~60 and test accuracy never gets above ~40% using tensorflow

Question

The problem

Using a DNNClassifier on Tensorflow and can never get my loss below around 60 and test accuracy above about a 40%. I was having a problem before, where my test accuracy was almost dead set at 25%, but after normalizing all my inputs, I was able to bring my test accuracy up a bit, but not much.

The data

All you need to know about the data is I have about 127,000 records of crime rate data. 15 features and one label. The purpose of the network is to classify them into the correct population quartile (which is based on the population of each county) So the output label is solely 4 classes (0-3).

The code

import pandas as pd
import tensorflow as tf
import os

dir_path = os.path.dirname(os.path.realpath(__file__))
csv_path = dir_path + "/testing.csv"
CSV_COLUMN_NAMES = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', 'Quartile']


def load_data():

    all = pd.read_csv(csv_path, names=CSV_COLUMN_NAMES, header=0).sample(frac=1)

    x = all.drop(['Quartile'], axis=1)
    y = all[['Quartile']].copy()

    size = x.shape[0]
    cutoff = int(0.75*size)

    train_x = x.head(cutoff)
    train_y = y.head(cutoff)

    test_x = x.tail(size-cutoff)
    test_y = y.tail(size-cutoff)

    return (train_x, train_y), (test_x, test_y)


def train_input_fn(features, labels, batch_size):

    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    return dataset


def eval_input_fn(features, labels, batch_size):

    features=dict(features)

    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)

    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)

    return dataset


def main(argv):

    batch_size = 50

    (train_x, train_y), (test_x, test_y) = load_data()

    my_feature_columns = []
    for key in train_x.keys():
        my_feature_columns.append(tf.feature_column.numeric_column(key=key))

    classifier = tf.estimator.DNNClassifier(
        feature_columns=my_feature_columns,
        hidden_units=[10, 10],
        optimizer=tf.train.GradientDescentOptimizer(0.001),
        n_classes=4)

    # training
    classifier.train(
        input_fn=lambda:train_input_fn(train_x, train_y, batch_size), steps=5000)

    # testing
    eval_result = classifier.evaluate(
        input_fn=lambda:eval_input_fn(test_x, test_y, batch_size))

    print('
Test set accuracy: {accuracy:0.3f}
'.format(**eval_result))


if __name__ == '__main__':
    tf.logging.set_verbosity(tf.logging.INFO)
    tf.app.run(main)

What I've tried

Messing with my learning rate in the gradient descent optimizer
Changing the loss reduction to MEAN instead of SUM which it defaults to
Changing my step size (up to 100000 even!)
Trying only one hidden layer (of size: 10) and two hidden layers (of size: 90 each)

I was hoping that you guys would be able to suggest any possible reasons why my neural network seems to be stalling out. Thanks!

Loss doesn't get below ~60 and test accuracy never gets above ~40% using tensorflow

Answers (1)

Related Questions

Loss doesn&#39;t get below ~60 and test accuracy never gets above ~40% using tensorflow

Answers (1)

Related Questions

Loss doesn't get below ~60 and test accuracy never gets above ~40% using tensorflow