Reputation: 23
I'm taking my first steps with deep learnig and tensorflow. Therefore, I have some questions.
According to the Tutorial and the Getting Started I created a DNN with hidden layer as well as some easy softmax modell. I used the dataset from https://archive.ics.uci.edu/ml/datasets/wine and split it up into train and test dataset.
from __future__ import print_function
import tensorflow as tf
num_attributes = 13
num_types = 3
def read_from_cvs(filename_queue):
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[] for col in range(
num_attributes + 1)]
attributes = tf.decode_csv(value, record_defaults=record_defaults)
features = tf.stack(attributes[1:], name="features")
labels = tf.one_hot(tf.cast(tf.stack(attributes[0], name="labels"), tf.uint8), num_types + 1, name="labels-onehot")
return features, labels
def input_pipeline(filename='wine_train.csv', batch_size=30, num_epochs=None):
filename_queue = tf.train.string_input_producer([filename], num_epochs=num_epochs, shuffle=True)
features, labels = read_from_cvs(filename_queue)
min_after_dequeue = 2 * batch_size
capacity = min_after_dequeue + 3 * batch_size
feature_batch, label_batch = tf.train.shuffle_batch(
[features, labels], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return feature_batch, label_batch
def train_and_test(hidden1, hidden2, learning_rate, epochs, train_batch_size, test_batch_size, test_interval):
examples_train, labels_train = input_pipeline(filename="wine_train.csv", batch_size=train_batch_size)
examples_test, labels_test = input_pipeline(filename="wine_train.csv", batch_size=test_batch_size)
with tf.name_scope("first layer"):
x = tf.placeholder(tf.float32, [None, num_attributes], name="input")
weights1 = tf.Variable(
tf.random_normal(shape=[num_attributes, hidden1], stddev=0.1), name="weights")
bias = tf.Variable(tf.constant(0.0, shape=[hidden1]), name="bias")
activation = tf.nn.relu(
tf.matmul(x, weights1) + bias, name="relu_act")
tf.summary.histogram("first_activation", activation)
with tf.name_scope("second_layer"):
weights2 = tf.Variable(
tf.random_normal(shape=[hidden1, hidden2], stddev=0.1),
name="weights")
bias2 = tf.Variable(tf.constant(0.0, shape=[hidden2]), name="bias")
activation2 = tf.nn.relu(
tf.matmul(activation, weights2) + bias2, name="relu_act")
tf.summary.histogram("second_activation", activation2)
with tf.name_scope("output_layer"):
weights3 = tf.Variable(
tf.random_normal(shape=[hidden2, num_types + 1], stddev=0.5), name="weights")
bias3 = tf.Variable(tf.constant(1.0, shape=[num_types+1]), name="bias")
output = tf.add(
tf.matmul(activation2, weights3, name="mul"), bias3, name="output")
tf.summary.histogram("output_activation", output)
y_ = tf.placeholder(tf.float32, [None, num_types+1])
with tf.name_scope("loss"):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=output))
tf.summary.scalar("cross_entropy", cross_entropy)
with tf.name_scope("train"):
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
with tf.name_scope("tests"):
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("accuracy", accuracy)
summary_op = tf.summary.merge_all()
sess = tf.InteractiveSession()
writer = tf.summary.FileWriter("./wineDnnLow", sess.graph)
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord, sess=sess)
try:
step = 0
while not coord.should_stop() and step < epochs:
# train
ex, lab = sess.run([examples_train, labels_train])
_ = sess.run([train_step], feed_dict={x: ex, y_: lab})
# test
if step % test_interval == 0:
ex, lab = sess.run([examples_test, labels_test])
summery, test_accuracy = sess.run([summary_op, accuracy], feed_dict={x: ex, y_: lab})
writer.add_summary(summery, step)
print("accurary= {0:f} on {}".format(test_accuracy, step))
step += 1
except tf.errors.OutOfRangeError:
print("Done training for %d steps" % (step))
coord.request_stop()
coord.join(threads)
sess.close()
def main():
train_and_test(10, 20, 0.5, 700, 30, 10, 1)
if __name__ == '__main__':
main()
The problem is, that accurary does not converge and seems to get random values. But when I try tf.contrib.learn.DNNClassifier my data gets classified pretty well. So can anyone give me some hint where the problem is on my self created DNN?
Moreover, I have a second question. On training I provide train_step on session.run() and on testing not. Does this ensure, that the weights are not influenced and so the graph is not learning by testing?
Edit: If I use the MNIST dataset and its batch handling insteat of mine the net behaves well. Therefore, I think the problem is caused by input_pipeline.
Upvotes: 2
Views: 214
Reputation: 4183
A quick glance at the dataset indicates to me the first thing I'd do is normalize it (subtract mean, divide by standard deviation). That said, it's still a very small dataset compared to MNIST, so don't expect everything to work exactly the same.
If you're unsure of your input pipeline, just load all the data into memory rather than using your input pipeline.
A few general notes:
feed_dict
, but if it was massive you'd be better off removing the placeholders and just using the output of the input_pipeline
(and building a separate graph for testing).Use the tf.layers
API for common layers types. For example, your inference section can be effectively reduced with the following three lines.
activation = tf.layers.dense(x, hidden1, activation=tf.nn.relu)
activation2 = tf.layers.dense(x, hidden2, activation=tf.nn.relu)
output = tf.layers.dense(activation2, num_types+1)
(You won't have the same initialization, but you can specify those with optional arguments. The defaults are a good place to start though.)
GradientDescentOptimizer
is very primitive. My current favourite is AdamOptimizer
, but experiment with others. If that looks too complex for you, MomentumOptimizer
generally gives a good trade-off between complexity and performance benefits.Check out the tf.estimator.Estimator
API. It'll make a lot of what you're doing much easier and force you to separate data loading from the model itself (a good thing).
Check out the tf.contrib.data.Dataset
API for data preprocessing. Queues have been around for a while in tensorflow so that's what most of the tutorials are written for, but the Dataset
API is much more intuitive/easier in my opinion. Again, it's a bit overkill for this situation where you can load all data into memory easily. See this question for how to use a Dataset
starting from a CSV file.
Upvotes: 1