javapythonmatlabmachine-learninglinear-algebra

Reputation: 2173

Vectorizing a gradient descent algorithm

I am coding gradient descent in matlab. For two features, I get for the update step:

temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;

However, I want to vectorize this code and to be able to apply it to any number of features. For the vectorization part, it shows that what I am trying to do is a matrix multiplication

theta = theta - (alpha/m) * (X' * (X*theta-y));

This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.

Then, how can I vectorize this code and make sure the parameters and updated at the same time?

Upvotes: 22

Answers (7)

Sandipan Dey

Reputation: 23101

In python vectorized gradient descent implementation for linear regression with MSE loss may look like the following:

import numpy as np
def grad_step(X, y, θ, α):
    m = X.shape[0]
    return θ - (α / m) * 2 * X.T @ (X @ θ - y)

def compute_loss(X, y, θ):
    return np.mean((X @ θ - y).T @ (X @ θ - y))

# run gradient descent
θ = np.zeros(X.shape[1]).reshape(-1,1)
for _ in range(100):
    θ = grad_step(X, y, θ, α = 0.01)

Since with matrix differentiation rules, we can compute the gradient of the cost function as follows:

Upvotes: 0

Martin Moltke Wozniak

Reputation: 43

Here is the vectorized form of gradient descent it works for me in octave.
remember that X is a matrix with ones in the first column (since theta_0 *1 is thetha_0). For each column in X you have a feature(n) in X. Each row is a training set(m). so X a m X (n+1 ) matrix. The y column vector could be the house prices. Its good to have a cost function to check if you find a minimum.
choose a value for alpha maybe a = 0.001 and try changing it for each time you run the code. The num_iters is the times you want it to run.

function theta = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples


 for iter = 1:num_iters

  theta = theta - (alpha/m) * (X') * ((X*theta)-y)


 end

end

see the full explanation here: https://www.coursera.org/learn/machine-learning/resources/QQx8l

Upvotes: 1

Abhishek Jain

Reputation: 13

I am very new to this topic, still my opinion is: if you compute X*theta before hand then while doing vectorized operation to adjust theta, need not to be in temp. in other words: if you compute X*theta while updating theta vector, theta(1) updates before theta(2) and hence changes the X*theta. but if we compute X*theta as y_pred and then do vectorize op on theta, it will be ok.

so my suggestion is(without using temp):

y_pred = X*theta %theta is [1;1] and X is mX2 matrix
theta = theta - (alpha/m) * (X' * (y_pred-y));

Please correct me if I am wrong.

Upvotes: 0

Madhusudhan B A

Reputation: 11

theta = theta - (alpha/m) * (X') * ((X*theta)-y)

Upvotes: 0

S.Arora

Reputation: 263

For the vectorized version try the following(two steps to make simultaneous update explicitly) :

 gradient = (alpha/m) * X' * (X*theta -y)
 theta = theta - gradient

Upvotes: 23

iatanasov

Reputation: 53

In order to update them simultaneously you need to keep the value of theta(1..n) in temporary vector and after the operation just update values in original theta vector.

This is the code, that I use for this purpose:

Temp update

tempChange = zeros(length(theta), 1);

tempChage = theta - (alpha/m) * (X' * (X*theta-y));

Actual update

theta = tempChage;

Upvotes: 2

lennon310

Reputation: 12689

Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.

This also works but less simplified than your 2nd implementation:

Error = X * theta - y;
for i = 1:2
    S(i) = sum(Error.*X(:,i));
end

theta = theta - alpha * (1/m) * S'

Upvotes: 10

Vectorizing a gradient descent algorithm

Answers (7)

Related Questions