DaveTheAl
DaveTheAl

Reputation: 2155

The dark mystery of tensorflow and tensorboard using cross-validation in training. Weird graphs showing up

This is the first time I'm using tensorboard, as I am getting a weird bug for my graph.

This is what I get if I open up the 'STEP' window. This it what I get if I open up the 'STEP' window.

However, this is what I get if I open up the 'RELATIVE'. (Similary when opening the 'WALL' window). RELATIVE window

In addition to that, to test the performance of the model, I apply cross-validation every few steps. The accuracy of this cross-validation drops from ~10% (random guessing), to 0% after some time. I am not sure where I have made a mistake, as I am not a pro with tensorflow, but I suspect my problem to be in the graph building. The code looks as follows:

def initialize_parameters():
    global_step = tf.get_variable("global_step", shape=[], trainable=False, 
            initializer=tf.constant_initializer(1), dtype=tf.int64)

    Weights = {
        "W_Conv1": tf.get_variable("W_Conv1", shape=[3, 3, 1, 64],
            initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
        ),
...
        "W_Affine3": tf.get_variable("W_Affine3", shape=[128, 10],
            initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
    )
}
    Bias = {
        "b_Conv1": tf.get_variable("b_Conv1", shape=[1, 16, 8, 64],
            initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
        ),
...
        "b_Affine3": tf.get_variable("b_Affine3", shape=[1, 10],
            initializer=tf.random_normal_initializer(mean=0.00, stddev=0.01),
    )
}
    return Weights, Bias, global_step


def build_model(W, b, global_step):

    keep_prob = tf.placeholder(tf.float32)
    learning_rate = tf.placeholder(tf.float32)
    is_training = tf.placeholder(tf.bool)

    ## 0.Layer: Input
    X_input = tf.placeholder(shape=[None, 16, 8], dtype=tf.float32, name="X_input")
    y_input = tf.placeholder(shape=[None, 10], dtype=tf.int8, name="y_input")

    inputs = tf.reshape(X_input, (-1, 16, 8, 1)) #must be a 4D input into the CNN layer
    inputs = tf.contrib.layers.batch_norm(
                        inputs,
                        center=False,
                        scale=False,
                        is_training=is_training
                    )

    ## 1. Layer: Conv1 (64, stride=1, 3x3)
    inputs = layer_conv(inputs, W['W_Conv1'], b['b_Conv1'], is_training)
... 

    ## 7. Layer: Affine 3 (128 units)
    logits = layer_affine(inputs, W['W_Affine3'], b['b_Affine3'], is_training)

    ## 8. Layer: Softmax, or loss otherwise
    predict = tf.nn.softmax(logits) #should be an argmax, or should this even go through


    ## Output: Loss functions and model trainers
    loss = tf.reduce_mean(
                tf.nn.softmax_cross_entropy_with_logits( 
                      labels=y_input, 
                      logits=logits
                )
           )
    trainer = tf.train.GradientDescentOptimizer(
                learning_rate=learning_rate
              ) 
    updateModel = trainer.minimize(loss, global_step=global_step)

    ## Test Accuracy
    correct_pred = tf.equal(tf.argmax(y_input, 1), tf.argmax(predict, 1))
    acc_op = tf.reduce_mean(tf.cast(correct_pred, "float"))

return X_input, y_input, loss, predict, updateModel, keep_prob, learning_rate, is_training

Now I suspect my error to be in the definition of the loss-function of the graph, but I am not sure. Any idea what the problem could be? Or does the model converge correctly and all those errors are expected?

Upvotes: 1

Views: 653

Answers (2)

Alberto Perez
Alberto Perez

Reputation: 1077

Yes I think you are runing same model more than once with your cross-validation implementation. Just try at the end of every loop

session.close()

Upvotes: 1

VS_FF
VS_FF

Reputation: 2363

I suspect you are getting such strange output (and I have seen similar myself) because you are running the same model more than once and it is saving the Tensorboard output in exactly the same place. I can't see in your code how you name the file where you are putting the output? Try to make the file path in this part of code unique:

`summary_writer = tf.summary.FileWriter(unique_path_to_log, sess.graph)`

You can also try to locate the directory where your existing output has bene put in and try to remove the files that have the older (or newer?) timestamps and this way Tensorboard will not be confused as to which one to use.

Upvotes: 1

Related Questions