Reputation: 23
I'm trying to figure out how to backpropagate a GRU Recurrent network, but I'm having trouble understanding the GRU architecture precisely.
The image below shows a GRU cell with 3 neural networks, receiving the concatenated previous hidden state and the input vector as its input.
This image used I referenced for backpropagation, however, shows the inputs being forwarded into W and U for each of the gates, added, and then having their appropriate activation functions applied.
the equation for the update gate shown on wikipedia is as shown here as an example
zt = sigmoid((W(z)xt + U(z)ht-1))
can somebody explain to me what W and U represent?
EDIT:
in most of the sources I found, W and U are usually referred to as "weights", so my best guess is that W and U represent their own neural networks, but this would contradict the image I found before.
if somebody could give an example of how W and U would work in a simple GRU, that would be helpful.
Sources for the images: https://cran.r-project.org/web/packages/rnn/vignettes/GRU_units.html https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45
Upvotes: 1
Views: 216
Reputation: 46
W
and U
are matrices whose values are learnt during training (a.k.a. neural network weights). The matrix W
multiplies the vector xt
and produces a new vector. Similarly, the matrix U multiplies the vector ht-1
and produces a new vector. Those two new vectors are added together and then each component of the result is passed to the sigmoid
function.
Upvotes: 1