Taiko
Taiko

Reputation: 1329

MLP(ReLu) stops learning after few iterations. Tensor Flow

2 layers MLP (Relu) + Softmax

After 20 iterations, Tensor Flow just gives up and stops updating any weights or biases.

I initially thought that my ReLu where dying, so I displayed histograms to make sure none of them where 0. And none of them are !

They just stop changing after few iterations and cross entropy is still high. ReLu, Sigmoid and tanh gives the same results. Tweaking GradientDescentOptimizer from 0.01 to 0.5 also doesn't change much.

There has to be a bug somewhere. Like an actual bug in my code. I can't even overfit a small sample set !

Here are my histograms and here's my code, if anyone could check it out, that would be a major help.

We have 3000 scalars with 6 values between 0 and 255 to classify in two classes : [1,0] or [0,1] (I made sure to randomise the order)

        def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
        with tf.name_scope(layer_name):
            weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
            tf.summary.histogram('weights', weights)

            biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
            tf.summary.histogram('biases', biases)

            preactivate = tf.matmul(input_tensor, weights) + biases
            tf.summary.histogram('pre_activations', preactivate)

            #act=tf.nn.relu
            activations = act(preactivate, name='activation')
            tf.summary.histogram('activations', activations)

            return activations


    #We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
    x = tf.placeholder(tf.float32, [None, 6])
    y = tf.placeholder(tf.float32, [None, 2])

    #After normalisation, input is between 0 and 1
    normalised = tf.scalar_mul(1/255,x)

    #Two layers
    hidden1 = nn_layer(normalised, 6, 4, "hidden1")
    hidden2 = nn_layer(hidden1, 4, 2, "hidden2")

    #Finish by a softmax
    softmax = tf.nn.softmax(hidden2)

    #Defining loss, accuracy etc..
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=softmax))      
    tf.summary.scalar('cross_entropy', cross_entropy)

    correct_prediction = tf.equal(tf.argmax(softmax, 1), tf.argmax(y, 1))

    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 
    tf.summary.scalar('accuracy', accuracy)

    #Init session and writers and misc
    session = tf.Session()

    train_writer = tf.summary.FileWriter('log', session.graph)
    train_writer.add_graph(session.graph)

    init= tf.global_variables_initializer()
    session.run(init)

    merged = tf.summary.merge_all()

    #Train
    train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)

    batch_x, batch_y = self.trainData
    for _ in range(1000):
        session.run(train_step, {x: batch_x, y: batch_y})
        #Every 10 steps, add to the summary
        if _ % 10 == 0: 
            s = session.run(merged, {x: batch_x, y: batch_y})
            train_writer.add_summary(s, _)


    #Evaluate
    evaluate_x, evaluate_y = self.evaluateData
    print(session.run(accuracy, {x: batch_x, y: batch_y}))
    print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))

Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant! TF didn't even try to modify them

Hidden Layer 1. Output isn't zero, so that's not a dying ReLu problem. but still, weights are constant ! TF didn't even try to modify them

Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.

Same for Hidden Layer 2. TF tried tweaking them a bit and gave up pretty fast.

Cross entropy does decrease, but stays staggeringly high.

Cross entropy does decrease, but stays staggeringly high.

EDIT : LOTS of mistakes in my code. First one is 1/255 = 0 in python... Changed it to 1.0/255.0 and my code started to live.

So basically, my input was multiplied by 0 and the neural network just was purely blind. So he tried to get the best result he could while being blind and then gave up. Which explains totally it's reaction.

Now I was applying a softmax twice... Modifying it helped also. And by strying different learning rates and different number of epoch I finally found something good.

Here is the final working code :

    def runModel(self):


    def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
        with tf.name_scope(layer_name):

            #This is standard weight for neural networks with ReLu.
            #I divide by math.sqrt(float(6)) because my input has 6 values
            weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
            tf.summary.histogram('weights', weights)

            #I chose this bias myself. It work. Not sure why.
            biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
            tf.summary.histogram('biases', biases)

            preactivate = tf.matmul(input_tensor, weights) + biases
            tf.summary.histogram('pre_activations', preactivate)

            #Some neurons will have ReLu as activation function
            #Some won't have any activation functions
            if act == "None":
                activations = preactivate
            else :
                activations = act(preactivate, name='activation')
                tf.summary.histogram('activations', activations)

            return activations


    #We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
    x = tf.placeholder(tf.float32, [None, 6])
    y = tf.placeholder(tf.float32, [None, 2])

    #After normalisation, input is between 0 and 1
    #Normalising input really helps. Nothing is doable without it
    #But my ERROR was to write 1/255. Becase in python
    #1/255 = 0 .... (integer division)
    #But 1.0/255.0 = 0,003921568 (float division)
    normalised = tf.scalar_mul(1.0/255.0,x)

    #Three layers total. The first one is just a matrix multiplication
    input = nn_layer(normalised, 6, 4, "input", act="None")
    #The second one has a ReLu after a matrix multiplication
    hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
    #The last one is also jsut a matrix multiplcation
    #WARNING ! No softmax here ! Because later we call a function
    #That implicitly does a softmax
    #And it's bad practice to do two softmax one after the other
    output = nn_layer(hidden1, 4, 2, "output", act="None")

    #Tried different learning rates
    #Higher learning rate means find a result faster
    #But could be a local minimum
    #Lower learning rate means we need much more epochs
    learning_rate = 0.03

    with tf.name_scope('learning_rate_'+str(learning_rate)):
        #Defining loss, accuracy etc..
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))      
        tf.summary.scalar('cross_entropy', cross_entropy)

        correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 
        tf.summary.scalar('accuracy', accuracy)

    #Init session and writers and misc
    session = tf.Session()

    train_writer = tf.summary.FileWriter('log', session.graph)
    train_writer.add_graph(session.graph)

    init= tf.global_variables_initializer()
    session.run(init)

    merged = tf.summary.merge_all()

    #Train
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

    batch_x, batch_y = self.trainData
    for _ in range(1000):
        session.run(train_step, {x: batch_x, y: batch_y})
        #Every 10 steps, add to the summary
        if _ % 10 == 0: 
            s = session.run(merged, {x: batch_x, y: batch_y})
            train_writer.add_summary(s, _)


    #Evaluate
    evaluate_x, evaluate_y = self.evaluateData
    print(session.run(accuracy, {x: batch_x, y: batch_y}))
    print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))

Final results after fixing

Upvotes: 3

Views: 664

Answers (2)

vipulnj
vipulnj

Reputation: 135

Just incase someone needs it in the future:

I had initialized my dual layer network's layers with np.random.randn but the network refused to learn. Using the He (for ReLU) and Xavier(for softmax) initializations totally worked.

Upvotes: 0

andrewchauzov
andrewchauzov

Reputation: 1009

I'm afraid that you have to reduce your learning rate. It's to high. High learning rate usually leads you to local minimum not global one.

Try 0.001, 0.0001 or even 0.00001. Or make your learning rate flexible.

I did not checked the code, so firstly try to tune LR.

Upvotes: 5

Related Questions