the parameter of RNN in Theano tutorial

Question

class RNNSLU(object):
''' elman neural net model '''
def __init__(self, nh, nc, ne, de, cs):
    '''
    nh :: dimension of the hidden layer
    nc :: number of classes
    ne :: number of word embeddings in the vocabulary
    de :: dimension of the word embeddings
    cs :: word window context size
    '''
    # parameters of the model
    self.emb = theano.shared(name='embeddings',
                             value=0.2 * numpy.random.uniform(-1.0, 1.0,
                             (ne+1, de))
                             # add one for padding at the end
                             .astype(theano.config.floatX))
    self.wx = theano.shared(name='wx',
                            value=0.2 * numpy.random.uniform(-1.0, 1.0,
                            (de * cs, nh))
                            .astype(theano.config.floatX))
    self.wh = theano.shared(name='wh',
                            value=0.2 * numpy.random.uniform(-1.0, 1.0,
                            (nh, nh))
                            .astype(theano.config.floatX))
    self.w = theano.shared(name='w',
                           value=0.2 * numpy.random.uniform(-1.0, 1.0,
                           (nh, nc))
                           .astype(theano.config.floatX))
    self.bh = theano.shared(name='bh',
                            value=numpy.zeros(nh,
                            dtype=theano.config.floatX))
    self.b = theano.shared(name='b',
                           value=numpy.zeros(nc,
                           dtype=theano.config.floatX))
    self.h0 = theano.shared(name='h0',
                            value=numpy.zeros(nh,
                            dtype=theano.config.floatX))

    # bundle
    self.params = [self.emb, self.wx, self.wh, self.w, self.bh, self.b, self.h0]



def recurrence(x_t, h_tm1):
        h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                             + T.dot(h_tm1, self.wh) + self.bh)
        s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
        return [h_t, s_t]

[h, s], = theano.scan(fn=recurrence,
                            sequences=x,
                            outputs_info=[self.h0, None],
                            n_steps=x.shape[0])

I am following this Theano tutorial about RNN.(http://deeplearning.net/tutorial/rnnslu.html) But I have two questions about it. First. In this tutorial, recurrence function like this:

def recurrence(x_t, h_tm1): h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh) s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b) return [h_t, s_t]

I wounder why do not plus h0 in h_t ? (i.e. h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh + self.h0))

Second, why outputs_info=[self.h0, None]? I know outputs_info is the Initialization result. So I think outputs_info=[self.bh+self.h0, T.nnet.softmax(T.dot(self.bh+self.h0, self.w_h2y) + self.b_h2y)]

Lodewic Van Twillert · Accepted Answer

def recurrence(x_t, h_tm1):
        h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                             + T.dot(h_tm1, self.wh) + self.bh)
        s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
        return [h_t, s_t]

So, first you ask why we don't use h0 in the recurrence function. Let's breakdown this part,

   h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)+ T.dot(h_tm1, self.wh) + self.bh)

What we expect is 3 terms.

The first term is the input layer multiplied by the weighting matrix T.dot(x_t, self.wx).
The second term is the hidden layer muliplied by another weighting matrix (this is what makes it recurrent) T.dot(h_tm1, self.wh). Note that you must have a weighting matrix, you proposed to add self.h0 as a bias basically.
The third term is the bias of the hidden layer, self.bh.

Now, after every iteration we want to keep track of the hidden layer activations, contained in self.h0. However, self.h0 is meant to contain the CURRENT activations and what we need is the previous activations.

[h, s], _ = theano.scan(fn=recurrence,
                            sequences=x,
                            outputs_info=[self.h0, None],
                            n_steps=x.shape[0])

So, look at the scan function again. You are right that outputs_info=[self.h0, None] initializes the values, but the values are also linked to the outputs. There are two outputs from recurrence(), namely [h_t, s_t].

So what the outputs_info does as well is that after every iteration, the value of self.h0 is overwritten with the value of h_t (the first returned value). The second element of outputs_info is None, because we do not save or initialize the value for s_t anywhere (the second argument of outputs_info is linked to the returned values of the recurrence function this way.)

In the next iteration, the first argument of outputs_info is used again as input, such that h_tm1 is the same value as self.h0. But, since we must have an argument for h_tm we must initialize this value. Since we don't need to initialize a second argument in outputs_info, we leave the second term as None.

Granted, the theano.scan() function is very confusing at times and I'm new at it too. But, this is what I understood from doing this same tutorial.

the parameter of RNN in Theano tutorial

Answers (1)

Related Questions