Reputation: 1429
class RNNSLU(object):
''' elman neural net model '''
def __init__(self, nh, nc, ne, de, cs):
'''
nh :: dimension of the hidden layer
nc :: number of classes
ne :: number of word embeddings in the vocabulary
de :: dimension of the word embeddings
cs :: word window context size
'''
# parameters of the model
self.emb = theano.shared(name='embeddings',
value=0.2 * numpy.random.uniform(-1.0, 1.0,
(ne+1, de))
# add one for padding at the end
.astype(theano.config.floatX))
self.wx = theano.shared(name='wx',
value=0.2 * numpy.random.uniform(-1.0, 1.0,
(de * cs, nh))
.astype(theano.config.floatX))
self.wh = theano.shared(name='wh',
value=0.2 * numpy.random.uniform(-1.0, 1.0,
(nh, nh))
.astype(theano.config.floatX))
self.w = theano.shared(name='w',
value=0.2 * numpy.random.uniform(-1.0, 1.0,
(nh, nc))
.astype(theano.config.floatX))
self.bh = theano.shared(name='bh',
value=numpy.zeros(nh,
dtype=theano.config.floatX))
self.b = theano.shared(name='b',
value=numpy.zeros(nc,
dtype=theano.config.floatX))
self.h0 = theano.shared(name='h0',
value=numpy.zeros(nh,
dtype=theano.config.floatX))
# bundle
self.params = [self.emb, self.wx, self.wh, self.w, self.bh, self.b, self.h0]
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
+ T.dot(h_tm1, self.wh) + self.bh)
s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
return [h_t, s_t]
[h, s], = theano.scan(fn=recurrence,
sequences=x,
outputs_info=[self.h0, None],
n_steps=x.shape[0])
I am following this Theano tutorial about RNN.(http://deeplearning.net/tutorial/rnnslu.html) But I have two questions about it. First. In this tutorial, recurrence function like this:
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh)
s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
return [h_t, s_t]
I wounder why do not plus h0 in h_t ? (i.e. h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh + self.h0)
)
Second, why outputs_info=[self.h0, None]
? I know outputs_info is the Initialization result. So I think outputs_info=[self.bh+self.h0, T.nnet.softmax(T.dot(self.bh+self.h0, self.w_h2y) + self.b_h2y)]
Upvotes: 0
Views: 161
Reputation: 782
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
+ T.dot(h_tm1, self.wh) + self.bh)
s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
return [h_t, s_t]
So, first you ask why we don't use h0 in the recurrence function. Let's breakdown this part,
h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)+ T.dot(h_tm1, self.wh) + self.bh)
What we expect is 3 terms.
The first term is the input layer multiplied by the weighting matrix T.dot(x_t, self.wx)
.
The second term is the hidden layer muliplied by another weighting matrix (this is what makes it recurrent) T.dot(h_tm1, self.wh)
. Note that you must have a weighting matrix, you proposed to add self.h0
as a bias basically.
The third term is the bias of the hidden layer, self.bh
.
Now, after every iteration we want to keep track of the hidden layer activations, contained in self.h0
. However, self.h0
is meant to contain the CURRENT activations and what we need is the previous activations.
[h, s], _ = theano.scan(fn=recurrence,
sequences=x,
outputs_info=[self.h0, None],
n_steps=x.shape[0])
So, look at the scan function again. You are right that outputs_info=[self.h0, None]
initializes the values, but the values are also linked to the outputs. There are two outputs from recurrence()
, namely [h_t, s_t]
.
So what the outputs_info does as well is that after every iteration, the value of self.h0
is overwritten with the value of h_t
(the first returned value). The second element of outputs_info is None
, because we do not save or initialize the value for s_t
anywhere (the second argument of outputs_info is linked to the returned values of the recurrence function this way.)
In the next iteration, the first argument of outputs_info
is used again as input, such that h_tm1
is the same value as self.h0
. But, since we must have an argument for h_tm
we must initialize this value. Since we don't need to initialize a second argument in outputs_info
, we leave the second term as None
.
Granted, the theano.scan()
function is very confusing at times and I'm new at it too. But, this is what I understood from doing this same tutorial.
Upvotes: 1