lina
lina

Reputation: 293

learning rate initializtion char-RNN implemented in tensorflow

I am recently reproducing the code for char-RNN described in http://karpathy.github.io/2015/05/21/rnn-effectiveness/. There are codes already implemented in tensorflow and the code I am referring to is at https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py. I have got a question regarding to the following lines in the code mentioned above:

    #1 loss = seq2seq.sequence_loss_by_example([self.logits],
            [tf.reshape(self.targets, [-1])],
            [tf.ones([args.batch_size * args.seq_length])],
            args.vocab_size)
    #2 self.cost = tf.reduce_sum(loss) / args.batch_size / args.seq_length
    #3 self.final_state = last_state
    #4 self.lr = tf.Variable(0.0, trainable=False)
    #5 tvars = tf.trainable_variables()
    #6 grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
            args.grad_clip)
    #7 optimizer = tf.train.AdamOptimizer(self.lr)
    #8 self.train_op = optimizer.apply_gradients(zip(grads, tvars))

The question is at #4: why are we setting the learning rate as 0? Is setting it to 0 the best way to initialize the learning rate?

Upvotes: 0

Views: 180

Answers (1)

jasekp
jasekp

Reputation: 1010

Looking through the code, it looks like the learning rate is set to another value before it is ever used.

sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))

This is necessary, because the learning rate is set to decay over time and the Adam Optimizer is only initialized once. Any value should work, but zero seems most aesthetically pleasing to me.

Upvotes: 1

Related Questions