Reputation: 2173
I am coding gradient descent in matlab. For two features, I get for the update step:
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1)); temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2)); theta(1,1) = temp0; theta(2,1) = temp1;
However, I want to vectorize this code and to be able to apply it to any number of features. For the vectorization part, it shows that what I am trying to do is a matrix multiplication
theta = theta - (alpha/m) * (X' * (X*theta-y));
This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.
Then, how can I vectorize this code and make sure the parameters and updated at the same time?
Upvotes: 22
Views: 20141
Reputation: 23101
In python
vectorized gradient descent implementation for linear regression with MSE loss may look like the following:
import numpy as np
def grad_step(X, y, θ, α):
m = X.shape[0]
return θ - (α / m) * 2 * X.T @ (X @ θ - y)
def compute_loss(X, y, θ):
return np.mean((X @ θ - y).T @ (X @ θ - y))
# run gradient descent
θ = np.zeros(X.shape[1]).reshape(-1,1)
for _ in range(100):
θ = grad_step(X, y, θ, α = 0.01)
Since with matrix differentiation rules, we can compute the gradient of the cost function as follows:
Upvotes: 0
Reputation: 43
Here is the vectorized form of gradient descent it works for me in octave.
remember that X is a matrix with ones in the first column (since theta_0 *1
is thetha_0
). For each column in X you have a feature(n) in X. Each row is a training set(m). so X a m X (n+1 ) matrix.
The y column vector could be the house prices.
Its good to have a cost function to check if you find a minimum.
choose a value for alpha maybe a = 0.001 and try changing it for each time you run the code. The num_iters
is the times you want it to run.
function theta = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
for iter = 1:num_iters
theta = theta - (alpha/m) * (X') * ((X*theta)-y)
end
end
see the full explanation here: https://www.coursera.org/learn/machine-learning/resources/QQx8l
Upvotes: 1
Reputation: 13
I am very new to this topic, still my opinion is:
if you compute X*theta
before hand then while doing vectorized operation to adjust theta, need not to be in temp.
in other words:
if you compute X*theta
while updating theta vector, theta(1) updates before theta(2) and hence changes the X*theta
.
but if we compute X*theta
as y_pred and then do vectorize op on theta, it will be ok.
so my suggestion is(without using temp):
y_pred = X*theta %theta is [1;1] and X is mX2 matrix
theta = theta - (alpha/m) * (X' * (y_pred-y));
Please correct me if I am wrong.
Upvotes: 0
Reputation: 263
For the vectorized version try the following(two steps to make simultaneous update explicitly) :
gradient = (alpha/m) * X' * (X*theta -y)
theta = theta - gradient
Upvotes: 23
Reputation: 53
In order to update them simultaneously you need to keep the value of theta(1..n) in temporary vector and after the operation just update values in original theta vector.
This is the code, that I use for this purpose:
Temp update
tempChange = zeros(length(theta), 1);
tempChage = theta - (alpha/m) * (X' * (X*theta-y));
Actual update
theta = tempChage;
Upvotes: 2
Reputation: 12689
Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.
This also works but less simplified than your 2nd implementation:
Error = X * theta - y;
for i = 1:2
S(i) = sum(Error.*X(:,i));
end
theta = theta - alpha * (1/m) * S'
Upvotes: 10