Reputation: 2997
TensorBoard runs, and I can see plots, however the Losses do not appear to update. I can print the batch losses at each step, I am uncertain why TensorBoard is not reflecting the losses.
I am attempting to learn about TensorFlow Slim, following this example:
Attempted
trace_every_n_steps
to various values (1, 2, 5)CODE
My code is largely from the Fine-tune the model on a different set of labels in the tutorial.
Explicitly:
import os
from preprocessing import inception_preprocessing
import numpy as np
import tensorflow as tf
from datasets import flowers
from nets import inception
from tensorflow.contrib import slim
import matplotlib.pyplot as plt
image_size = inception.inception_v1.default_image_size
checkpoints_dir = './tmp/checkpoints'
flowers_data_dir = './tmp/data/tf_records'
if not tf.gfile.Exists(checkpoints_dir):
tf.gfile.MakeDirs(checkpoints_dir)
def load_batch(dataset, batch_size=32, height=299, width=299, is_training=False):
"""Loads a single batch of data.
Args:
dataset: The dataset to load.
batch_size: The number of images in the batch.
height: The size of each image after preprocessing.
width: The size of each image after preprocessing.
is_training: Whether or not we're currently training or evaluating.
Returns:
images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
"""
data_provider = slim.dataset_data_provider.DatasetDataProvider(
dataset, common_queue_capacity=32,
common_queue_min=8)
image_raw, label = data_provider.get(['image', 'label'])
# Preprocess image for usage by Inception.
image = inception_preprocessing.preprocess_image(image_raw, height, width, is_training=is_training)
# Preprocess the image for display purposes.
image_raw = tf.expand_dims(image_raw, 0)
image_raw = tf.image.resize_images(image_raw, [height, width])
image_raw = tf.squeeze(image_raw)
# Batch it up.
images, images_raw, labels = tf.train.batch(
[image, image_raw, label],
batch_size=batch_size,
num_threads=1,
capacity=2 * batch_size)
return images, images_raw, labels
def get_init_fn():
"""Returns a function run by the chief worker to warm-start the training."""
checkpoint_exclude_scopes = ["InceptionV1/Logits", "InceptionV1/AuxLogits"]
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
break
if not excluded:
variables_to_restore.append(var)
return slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
variables_to_restore)
train_dir = './tmp/inception_finetuned/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset, height=image_size, width=image_size)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
# Specify the loss function:
one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
slim.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = slim.losses.get_total_loss()
# Create some summaries to visualize the training process:
tf.summary.scalar('losses/Total_Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
train_writer = tf.summary.FileWriter(train_dir)
# train_writer.add_summary()
# Run the training:
final_loss = slim.learning.train(
train_op,
logdir=train_dir,
init_fn=get_init_fn(),
number_of_steps=2,
summary_writer=train_writer,
trace_every_n_steps=2)
print('Finished training. Last batch loss %f' % final_loss)
Related
There are a large number of questions that are related to TensorBoard issues, however in these cases, TensorBoard doesn't run at all, shows nothing, or gives some kind of error. In my case, TensorBoard appears to runs without errors, and I can see the losses plot generated, it doesn't appear to grab more than one value.
Upvotes: 1
Views: 1406
Reputation: 2997
I switched the code and went down very different rabbit holes, but I think the problem was I overlooked save_summaries_secs
flag - if set high (e.g. the default 600 seconds) it takes time for the metrics to update.
Upvotes: 0