Theano multiplying by zero

Question

Can anybody explain to me what is the meaning behind these two lines of code from here: https://github.com/Newmu/Theano-Tutorials/blob/master/4_modern_net.py

    acc = theano.shared(p.get_value() * 0.)
    acc_new = rho * acc + (1 - rho) * g ** 2

Is it a mistake? Why do we instantiate acc to zero and then multiply it by rho in next line? It looks like it will not achieve anything this way and remain zero. Will there be any difference if we replace "rho * acc" by just "acc"?

The full function is given below:

def RMSprop(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
    grads = T.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        acc = theano.shared(p.get_value() * 0.)
        acc_new = rho * acc + (1 - rho) * g ** 2
        gradient_scaling = T.sqrt(acc_new + epsilon)
        g = g / gradient_scaling
        updates.append((acc, acc_new))
        updates.append((p, p - lr * g))
    return updates

Daniel Renshaw · Accepted Answer

This is just a way to tell Theano "create a shared variable and initialize its value to be zero in the same shape as p."

This RMSprop method is a symbolic method. It does not actually compute the RmsProp parameter updates, it only tells Theano how parameter updates should be computed when the eventual Theano function is executed.

If you look further down the tutorial code you linked to you'll see the symbolic execution graph for the parameter updates are constructed by RMSprop via a call on line 67. These updates are then compiled into a Theano function called train in Python on line 69 and the train function is executed many times on line 74 within the for loops of lines 72 and 73. The Python function RMSprop will be called only once, irrespective of how many times the train function is called within the for loops on lines 72 and 73.

Within RMSprop, we are telling Theano that, for each parameter p, we need a new Theano variable whose initial value has the same shape as p and is 0. throughout. We then go on to tell Theano how it should update both this new variable (unnamed as far as Theano is concerned but named acc in Python) and how to update the parameter p itself. These commands do not alter either p or acc, they just tell Theano how p and acc should be updated later, once the function has been compiled (line 69) each time it is executed (line 74).

The function executions on line 74 will not call the RMSprop Python function, they execute a compiled version of RMSprop. There will be no initialization inside the compiled version because that already happened in the Python version of RMSprop. Each train execution of the line acc_new = rho * acc + (1 - rho) * g ** 2 will use the current value of acc not its initial value.

Theano multiplying by zero

Answers (1)

Related Questions