Reputation: 461
The problem
Using a DNNClassifier on Tensorflow and can never get my loss below around 60 and test accuracy above about a 40%. I was having a problem before, where my test accuracy was almost dead set at 25%, but after normalizing all my inputs, I was able to bring my test accuracy up a bit, but not much.
The data
All you need to know about the data is I have about 127,000 records of crime rate data. 15 features and one label. The purpose of the network is to classify them into the correct population quartile (which is based on the population of each county) So the output label is solely 4 classes (0-3).
The code
import pandas as pd
import tensorflow as tf
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
csv_path = dir_path + "/testing.csv"
CSV_COLUMN_NAMES = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', 'Quartile']
def load_data():
all = pd.read_csv(csv_path, names=CSV_COLUMN_NAMES, header=0).sample(frac=1)
x = all.drop(['Quartile'], axis=1)
y = all[['Quartile']].copy()
size = x.shape[0]
cutoff = int(0.75*size)
train_x = x.head(cutoff)
train_y = y.head(cutoff)
test_x = x.tail(size-cutoff)
test_y = y.tail(size-cutoff)
return (train_x, train_y), (test_x, test_y)
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset
def eval_input_fn(features, labels, batch_size):
features=dict(features)
if labels is None:
inputs = features
else:
inputs = (features, labels)
dataset = tf.data.Dataset.from_tensor_slices(inputs)
assert batch_size is not None, "batch_size must not be None"
dataset = dataset.batch(batch_size)
return dataset
def main(argv):
batch_size = 50
(train_x, train_y), (test_x, test_y) = load_data()
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[10, 10],
optimizer=tf.train.GradientDescentOptimizer(0.001),
n_classes=4)
# training
classifier.train(
input_fn=lambda:train_input_fn(train_x, train_y, batch_size), steps=5000)
# testing
eval_result = classifier.evaluate(
input_fn=lambda:eval_input_fn(test_x, test_y, batch_size))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run(main)
What I've tried
MEAN
instead of SUM
which it defaults toI was hoping that you guys would be able to suggest any possible reasons why my neural network seems to be stalling out. Thanks!
Upvotes: 1
Views: 231
Reputation: 387
You may have a look to the following directions:
2
(64, 128, 256, ...).Upvotes: 1