Reputation: 3477
I have been following a course about neural networks in Coursera and came across this model:
I understand that the values of z1, z2 and so on are the values from the linear regression that will be put into an activation function. The problem that I have is when the author says that there should be one matrix of weights and a vector of the inputs, like this:
I know that the vector of Xs has a dimension of 3 x 1 because there are three inputs, but why the array of Ws is of dimensions 4 x 3?. I can deduct that it has four rows because those are the weights w1, w2, w3 and w4 that they correspond to each ones of the values of a1...a4, but what is inside that array? Its elements are something like:
w1T w1T w1T
w2T w2T w3T
... ?
so when I multiply by x1, for example, I will get:
w1Tx1+w1Tx2+w1Tx3=w1T(x1+x2+x3)=w1TX
I have think about it, but I cannot really get a grasp about what this array contains, even though I know that at the end I will have a vector of 4 x 1 that corresponds to the values of z. Any help?
Thanks
Upvotes: 8
Views: 23777
Reputation: 31
Too late to answer this , but hope it helps somebody .
so we know every input needs weight with it . there are 3 inputs . there are 4 neuron
1st neuron : x1 - w11
x2 - w12
x3 - w13
similarly,
2nd neuron x1 - w21
x2 - w22
x3 - w23
3 rd neuron x1 - w31
x2 - w32
x3 - w33
4 th neuron x1 - w41
x2 - w42
x3 - w43
As we know for 1 unknown we require 1 equation -> 3x+4=12 for 2 unknown we require 2 equations -> 3x+4y=12 & 4x+3y=24 now there should be 3X4 = 12 linear equations only then we can solve for 12 unknown weights.
now we have a matrix that is 3X4 . but when we multiply it with a matrix of 3X1 , it will not happen , since the multiplicative rule states that matrix multiplication can only happen when matrix is of size mxn X nxk = mxk
so we transform the weight matrix to 4x3 and input is 3x1 result in 4x1 matrix
Upvotes: 3
Reputation: 1204
Note that in your course the vector represents features in column-major order, and the weight matrix W
represents for 4 neurons each has 3 weight parameters in this way:
but in order to do the math, you should transpose it, that's why you use W^T
Upvotes: 0
Reputation: 503
As a thumb rule, weight matrix has following dimensions :
Therefore weight matrix = (3X4). If you take the transpose, it becomes (4X3).
Upvotes: 11
Reputation: 2656
If x
is 3x1
, then a weight matrix of size Nx3
will give you a hidden layer with N
units. In your case N = 4
(see the network schematic). This follows from the fact that multiplying a Nx3
matrix with a 3x1
vector gives a Nx1
vector as output, hence, N
hidden units.
Each row of the weight matrix defines the weights for a single hidden unit, so the scalar product of w_1
and x
(plus bias) gives z_1
:
In the end, writing all quantities as vectors and matrices simply allows you to use succinct linear algebra notation:
where we assume that the activation is applied element-wise.
Upvotes: 1