learning rate initializtion char-RNN implemented in tensorflow

Question

I am recently reproducing the code for char-RNN described in http://karpathy.github.io/2015/05/21/rnn-effectiveness/. There are codes already implemented in tensorflow and the code I am referring to is at https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py. I have got a question regarding to the following lines in the code mentioned above:

    #1 loss = seq2seq.sequence_loss_by_example([self.logits],
            [tf.reshape(self.targets, [-1])],
            [tf.ones([args.batch_size * args.seq_length])],
            args.vocab_size)
    #2 self.cost = tf.reduce_sum(loss) / args.batch_size / args.seq_length
    #3 self.final_state = last_state
    #4 self.lr = tf.Variable(0.0, trainable=False)
    #5 tvars = tf.trainable_variables()
    #6 grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
            args.grad_clip)
    #7 optimizer = tf.train.AdamOptimizer(self.lr)
    #8 self.train_op = optimizer.apply_gradients(zip(grads, tvars))

The question is at #4: why are we setting the learning rate as 0? Is setting it to 0 the best way to initialize the learning rate?

jasekp · Accepted Answer

Looking through the code, it looks like the learning rate is set to another value before it is ever used.

sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))

This is necessary, because the learning rate is set to decay over time and the Adam Optimizer is only initialized once. Any value should work, but zero seems most aesthetically pleasing to me.

learning rate initializtion char-RNN implemented in tensorflow

Answers (1)

Related Questions