Reputation: 715
I'm trying to adjust the learning rate of the gradient descent algorithm. I would like to be able to confirm whether or not my changes to learning_rate
are actually having an effect on my theano training function.
Sample code:
#set up the updates
for param in params:
updates.append((param, param-learning_rate*T.grad(cost, param)))
#set up the training function
train = theano.function(inputs=[index], outputs=[cost], updates=updates, givens={x:self.X[index:index+mini_batch_size,:]})
#run through the minibatches
for epoch in range(n_epochs):
for row in range(0,self.m, mini_batch_size):
cost = train(row)
#occasionally adjust the learning rate
learning_rate = learning_rate/2.0
Is this going to work as I desire? How can I confirm?
It seems like this will not work, based on this little test:
x = th.tensor.dscalar()
rate=5.0
f = th.function(inputs=[x], outputs=2*x*rate)
print(f(10))
>> 100.0
rate=0.0
print(f(10))
>> 100.0
What is the correct way to go about this?
Upvotes: 2
Views: 325
Reputation: 1596
Inspired with the answer of @Daniel Renshaw you can try the following:
learning_rate = theano.shared(0.01)
for param in params:
updates.append((param, param-learning_rate*T.grad(cost, param)))
#set up the training function
train = theano.function(inputs=[index], outputs=[cost], updates=updates, givens={x:self.X[index:index+mini_batch_size,:]})
#run through the minibatches
for epoch in range(n_epochs):
for row in range(0,self.m, mini_batch_size):
cost = train(row)
#occasionally adjust the learning rate
learning_rate.set_value(learning_rate.get_value()/ 2)
Basically you use a shared variable and update it manually every iteration.
Upvotes: 1
Reputation: 34187
The problem is that your code is compiling the learning rate into the computation graph as a constant. If you want to change the rate you'll need to use a Theano variable to represent it in the computation graph and then provide a value when the function is executed. This can be done in two ways:
Pass the rate each time the function is executed by treating it as an input value and representing it in the computation graph as a scalar tensor.
Store the rate in a Theano shared variable. Change the variable manually before executing the function.
There are two variations of the second approach. In the first you manually adjust the rate value before the execution. In the second you specify a symbolic expression explaining how the rate should be updated on each execution.
These three approaches are demonstrated in this sample code based the edited portion of the question.
import theano as th
import theano.tensor
# Original version (changing rate doesn't affect theano function output)
x = th.tensor.dscalar()
rate=5.0
f = th.function(inputs=[x], outputs=2*x*rate)
print(f(10))
rate=0.0
print(f(10))
# New version using an input value
x = th.tensor.dscalar()
rate=th.tensor.scalar()
f = th.function(inputs=[x, rate], outputs=2*x*rate)
print(f(10, 5.0))
print(f(10, 0.0))
# New version using a shared variable with manual update
x = th.tensor.dscalar()
rate=th.shared(5.0)
f = th.function(inputs=[x], outputs=2*x*rate)
print(f(10))
rate.set_value(0.0)
print(f(10))
# New version using a shared variable with automatic update
x = th.tensor.dscalar()
rate=th.shared(5.0)
updates=[(rate, rate / 2.0)]
f = th.function(inputs=[x], outputs=2*x*rate, updates=updates)
print(f(10))
print(f(10))
print(f(10))
print(f(10))
Upvotes: 2