Reputation: 850
I am training a convolutional neural network and I got some unexpected behavior with the shuffle_batch fraction summary, or maybe I just do not understand it. Can someone pls explain it? The difference between those two graphs is that I exchanged the loss function.
With this loss function I get the line at 0.0
loss = tf.nn.l2_loss(expected_labels-labels)
While this one gives me a constant 1.0 (after hitting 1.0 the first time)
loss = tf.reduce_mean(tf.square(expected_labels - labels))
Can the change of loss function really cause that change? I am not sure what this means.
EDIT: Code as requested The first part is for setting up the batching and the big picture.
filename_queue = tf.train.string_input_producer(filenames,
num_epochs=None)
label, image = read_and_decode_single_example(filename_queue=filename_queue)
image = tf.image.decode_jpeg(image.values[0], channels=3)
jpeg = tf.cast(image, tf.float32) / 255.
jpeg.set_shape([66,200,3])
images_batch, labels_batch = tf.train.shuffle_batch(
[jpeg, label], batch_size= FLAGS.batch_size,
num_threads=8,
capacity=60000,
min_after_dequeue=10000)
images_placeholder, labels_placeholder = placeholder_inputs(
FLAGS.batch_size)
label_estimations, W1_conv, h1_conv, current_images = e2e.inference(images_placeholder)
# Add to the Graph the Ops for loss calculation.
loss = e2e.loss(label_estimations, labels_placeholder)
# Decay once per epoch, using an exponential schedule starting at 0.01.
# Add to the Graph the Ops that calculate and apply gradients.
train_op = e2e.training(loss, FLAGS.learning_rate, FLAGS.batch_size)
Here come the methods for inference loss and train
def inference(images):
with tf.name_scope('conv1'):
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, FEATURE_MAPS_C1], stddev=STDDEV))
b_conv1 = tf.Variable(tf.constant(BIAS_INIT, shape=[FEATURE_MAPS_C1]))
h_conv1 = tf.nn.bias_add(
tf.nn.conv2d(images, W_conv1, strides=[1, 2, 2, 1], padding='VALID'), b_conv1)
with tf.name_scope('conv2'):
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, FEATURE_MAPS_C1, 36], stddev=STDDEV))
b_conv2 = tf.Variable(tf.constant(BIAS_INIT, shape=[36]))
h_conv2 = tf.nn.conv2d(h_conv1, W_conv2, strides=[1, 2, 2, 1], padding='VALID') + b_conv2
with tf.name_scope('conv3'):
W_conv3 = tf.Variable(tf.truncated_normal([5, 5, 36, 48], stddev=STDDEV))
b_conv3 = tf.Variable(tf.constant(BIAS_INIT, shape=[48]))
h_conv3 = tf.nn.conv2d(h_conv2, W_conv3, strides=[1, 2, 2, 1], padding='VALID') + b_conv3
with tf.name_scope('conv4'):
W_conv4 = tf.Variable(tf.truncated_normal([3, 3, 48, 64], stddev=STDDEV))
b_conv4 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
h_conv4 = tf.nn.conv2d(h_conv3, W_conv4, strides=[1, 1, 1, 1], padding='VALID') + b_conv4
with tf.name_scope('conv5'):
W_conv5 = tf.Variable(tf.truncated_normal([3, 3, 64, 64], stddev=STDDEV))
b_conv5 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
h_conv5 = tf.nn.conv2d(h_conv4, W_conv5, strides=[1, 1, 1, 1], padding='VALID') + b_conv5
h_conv5_flat = tf.reshape(h_conv5, [-1, 1 * 18 * 64])
with tf.name_scope('fc1'):
W_fc1 = tf.Variable(tf.truncated_normal([1 * 18 * 64, 100], stddev=STDDEV))
b_fc1 = tf.Variable(tf.constant(BIAS_INIT, shape=[100]))
h_fc1 = tf.matmul(h_conv5_flat, W_fc1) + b_fc1
with tf.name_scope('fc2'):
W_fc2 = tf.Variable(tf.truncated_normal([100, 50], stddev=STDDEV))
b_fc2 = tf.Variable(tf.constant(BIAS_INIT, shape=[50]))
h_fc2 = tf.matmul(h_fc1, W_fc2) + b_fc2
with tf.name_scope('fc3'):
W_fc3 = tf.Variable(tf.truncated_normal([50, 10], stddev=STDDEV))
b_fc3 = tf.Variable(tf.constant(BIAS_INIT, shape=[10]))
h_fc3 = tf.matmul(h_fc2, W_fc3) + b_fc3
with tf.name_scope('fc4'):
W_fc4 = tf.Variable(tf.truncated_normal([10, 1], stddev=STDDEV))
b_fc4 = tf.Variable(tf.constant(BIAS_INIT, shape=[1]))
h_fc4 = tf.matmul(h_fc3, W_fc4) + b_fc4
return h_fc4
Here is the loss function, using l2 causes the issue.
def loss(label_estimations, labels):
n_labels = tf.reshape(label_estimations, [-1])
# Here are the two loss functions
#loss = tf.reduce_mean(tf.square(n_labels - labels))
loss = tf.nn.l2_loss(n_labels-labels)
return loss
Train method:
def training(loss, learning_rate, batch_size):
global_step = tf.Variable(0, name='global_step', trainable=False)
tf.scalar_summary('learning_rate',learning_rate)
tf.scalar_summary('Loss ('+loss.op.name+')', loss)
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
Plot for tf.reduce_sum(tf.square(n_labels - labels)/2)
Upvotes: 1
Views: 786
Reputation: 91
As mentioned in TensorFlow's original guide https://www.tensorflow.org/programmers_guide/reading_data
How many threads do you need? the tf.train.shuffle_batch* functions add a summary to the graph that indicates how full the example queue is. If you have enough reading threads, that summary will stay above zero. You can view your summaries as training progresses using TensorBoard.
It seems better if the queue is never empty, i.e. the "fraction_full" stays non-zero. If not, you should allocate more threads to queue_runner
Upvotes: 1
Reputation: 66805
The only difference between your loss and l2
is scaling, thus you might need to play around with your learning rate / other hyperparameters to take this into account.
l2 loss in TF is defined as:
1/2 SUM_i^N (pred(x_i) - y_i)^2
while your cost is
1/N SUM_i^N (pred(x_i) - y_i)^2
Of course since you are using stochastic gradient approach, efficienty you are using an approximator of form
1/2 SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # l2
1/#batch SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # you
Thus you would have to multiply your cost by batch_size / 2
to get the original cost. Typically this is not a problem, but sometimes wrong scaling can put you in very degenerated parts of the error surface, and the optimizer will simply fail (especially such aggressive one like Adam).
Side note - you are aware that your model is a deep linear model? You do not have any non-linearities in the model. This is very specific network.
Upvotes: 0