Little
Little

Reputation: 3477

weight matrix dimension intuition in a neural network

I have been following a course about neural networks in Coursera and came across this model:

enter image description here

I understand that the values of z1, z2 and so on are the values from the linear regression that will be put into an activation function. The problem that I have is when the author says that there should be one matrix of weights and a vector of the inputs, like this:

enter image description here

I know that the vector of Xs has a dimension of 3 x 1 because there are three inputs, but why the array of Ws is of dimensions 4 x 3?. I can deduct that it has four rows because those are the weights w1, w2, w3 and w4 that they correspond to each ones of the values of a1...a4, but what is inside that array? Its elements are something like:

w1T w1T w1T
w2T w2T w3T
... ?

so when I multiply by x1, for example, I will get:

w1Tx1+w1Tx2+w1Tx3=w1T(x1+x2+x3)=w1TX

I have think about it, but I cannot really get a grasp about what this array contains, even though I know that at the end I will have a vector of 4 x 1 that corresponds to the values of z. Any help?

Thanks

Upvotes: 8

Views: 23777

Answers (4)

Mohammad Shariq
Mohammad Shariq

Reputation: 31

Too late to answer this , but hope it helps somebody .

so we know every input needs weight with it . there are 3 inputs . there are 4 neuron

1st neuron : x1 - w11
             x2 - w12 
             x3 - w13

similarly,
2nd neuron x1 - w21 
           x2 - w22
           x3 - w23

3 rd neuron x1 - w31 
            x2 - w32
            x3 - w33

4 th neuron x1 - w41
            x2 - w42
            x3 - w43

As we know for 1 unknown we require 1 equation -> 3x+4=12 for 2 unknown we require 2 equations -> 3x+4y=12 & 4x+3y=24 now there should be 3X4 = 12 linear equations only then we can solve for 12 unknown weights.

now we have a matrix that is 3X4 . but when we multiply it with a matrix of 3X1 , it will not happen , since the multiplicative rule states that matrix multiplication can only happen when matrix is of size mxn X nxk = mxk

so we transform the weight matrix to 4x3 and input is 3x1 result in 4x1 matrix

Upvotes: 3

Elinx
Elinx

Reputation: 1204

Note that in your course the vector represents features in column-major order, and the weight matrix W represents for 4 neurons each has 3 weight parameters in this way:

Wxxx

but in order to do the math, you should transpose it, that's why you use W^T

Upvotes: 0

Parul Singh
Parul Singh

Reputation: 503

As a thumb rule, weight matrix has following dimensions :

  • The number of rows must equal the number of neurons in the previous layer. (in this case previous layer is input layer). So 3
  • The number of columns must match the number of neurons in the next layer. So 4.

Therefore weight matrix = (3X4). If you take the transpose, it becomes (4X3).

Upvotes: 11

cheersmate
cheersmate

Reputation: 2656

If x is 3x1, then a weight matrix of size Nx3 will give you a hidden layer with N units. In your case N = 4 (see the network schematic). This follows from the fact that multiplying a Nx3 matrix with a 3x1 vector gives a Nx1 vector as output, hence, N hidden units.

Each row of the weight matrix defines the weights for a single hidden unit, so the scalar product of w_1 and x (plus bias) gives z_1:

In the end, writing all quantities as vectors and matrices simply allows you to use succinct linear algebra notation:

where we assume that the activation is applied element-wise.

Upvotes: 1

Related Questions