Savannah Madison
Savannah Madison

Reputation: 667

Trouble in understanding how the dimension of the weight matrix or theta matrix is 3X4?

I was following Andrew Ng's course on Ml and in the Neural network week 4 slide,while talking on modal representation 1 , he mentions that the dimension of the weight matrix is 3X4 as shown below: Lecture Slide

I know there is a formula that tells that if there as Sj nodes in the jth layer and sj+1 nodes in the j+1 layer then the dimension of the matrix mapping from the j to j+1 layer will be S(j+1) X (Sj + 1).

But I dont know how the formula came and hence not able to understand the above example.

Upvotes: 1

Views: 1536

Answers (2)

Sonia Samipillai
Sonia Samipillai

Reputation: 620

From the image we understand the following :

  1. j is the layer number
  2. sj is the number of nodes in the corresponding jth layer.

For example, In the first layer, j=1 and the number of nodes,sj = 3.

In the architecture shown we have three layers.

  1. Input layer (j=1 and s1 =3)
  2. Hidden layer (j=2 and s2 =3)
  3. Output layer (j=3 and s3 =1)

Now, theta is the matrix of weights between the layers. Theta is also known as weights or parameters.We have three layers and hence two matrices, theta1 and theta2.

weight_matrix1(Theta1) mapping is between input layer and the hidden layer

weight_matrix2(Theta2) mapping is between the hidden layer and the output layer

By formula:

The dimensions of the weight matrix1(between s2 and s1) = s2 * (s1+1) = 3 *(3+1) = 3 * 4

The dimensions of the weight matrix2(between s3 and s2) = s3 * (s2+1) = 1 *(3+1) = 1 * 4

We add 1 to account for the bias layer x0. For matrix multiplication, the number of rows in matrix1 should be equal to the number of columns in matrix2. We know the dimensions of layer1 and the dimensions of layer 2 in this example, from which we calculate the dimensions of weight_matrix theta.

Upvotes: 0

samnaction
samnaction

Reputation: 1254

The size of a Theta (weight) matrix is (outputs x inputs).

The input includes a bias unit.

The output doesn't include the bias unit.

In the diagram, it will be [3 x (3+1)]. Here the additional 1 is the bias unit added to the input.

Hence the simple formula is S(j+1) X (Sj + 1) which is 3 x (3+1)

Upvotes: 2

Related Questions