What do W and U notate in a GRU?

Question

I'm trying to figure out how to backpropagate a GRU Recurrent network, but I'm having trouble understanding the GRU architecture precisely.

The image below shows a GRU cell with 3 neural networks, receiving the concatenated previous hidden state and the input vector as its input.

GRU example

This image used I referenced for backpropagation, however, shows the inputs being forwarded into W and U for each of the gates, added, and then having their appropriate activation functions applied.

GRU Backpropagation

the equation for the update gate shown on wikipedia is as shown here as an example

zt = sigmoid((W(z)xt + U(z)ht-1))

can somebody explain to me what W and U represent?

EDIT:

in most of the sources I found, W and U are usually referred to as "weights", so my best guess is that W and U represent their own neural networks, but this would contradict the image I found before.

if somebody could give an example of how W and U would work in a simple GRU, that would be helpful.

Sources for the images: https://cran.r-project.org/web/packages/rnn/vignettes/GRU_units.html https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45

Sharvil Nanavati · Accepted Answer

W and U are matrices whose values are learnt during training (a.k.a. neural network weights). The matrix W multiplies the vector xt and produces a new vector. Similarly, the matrix U multiplies the vector ht-1 and produces a new vector. Those two new vectors are added together and then each component of the result is passed to the sigmoid function.

What do W and U notate in a GRU?

Answers (1)

Related Questions