newhere
newhere

Reputation: 137

Simple Gradient Descent Implementation Error

I have tried to use a toy problem of linear regression for implanting the optimisation on the MSE function using the algorithm of gradient decent.

import numpy as np

# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])

# MSE function
f = lambda a, b: 1 / len(x) * np.sum(np.power(y - (a * x + b), 2))


# Gradient
def grad_f(v_coefficients):
    a = v_coefficients[0, 0]
    b = v_coefficients[1, 0]
    return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
                     1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)


# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
    tol = 10 ** -3
    epsilon = np.array([tol * np.ones([2, 1], int)])
    alpha = 0.2
    v_next = v_prev - alpha * grad_f(v_prev)
    if (np.abs(v_next - v_prev) <= epsilon).all():
        return v_next
    else:
        gradient_decent(v_next)


# v_0 is the initial guess
v_0 = np.array([[1], [1]])


gradient_decent(v_0)

I have tried different alpha values but the code never converges (infinite recursion) it seems that the issue is with the stop condition of the recursion, but after few runs the v_next and v_prev bounces between -infinte to infinite

Upvotes: 0

Views: 52

Answers (1)

Anvar Kurmukov
Anvar Kurmukov

Reputation: 662

It's great that you are learning machine learning (^_^) by implementing some base algorithms by yourself. Regarding your question, there are two problems in your code, first one is mathematical, the sign in:

def grad_f(v_coefficients):
    a = v_coefficients[0, 0]
    b = v_coefficients[1, 0]
    return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
                     1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)

should be

return -np.array(...)

since E=mc^2

the second one is programming, this kind of code will not return you a result in Python:

def add(x):
    new_x = x + 1
    if new_x > 10:
        return new_x
    else:
        add(new_x)

you must use return in both clauses of the if statement, so it should be

def add(x):
    new_x = x + 1
    if new_x > 10:
        return new_x
    else:
        return add(new_x)

There is also a minor issue with the alpha coefficient for these particular data points alpha=0.2 is too big for algorithm to converge, you need to use smaller alpha. I also slightly refactor your initial code using numpy broadcasting convention (https://numpy.org/doc/stable/user/basics.broadcasting.html) to get the following result:

import numpy as np

# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])

# MSE function
f = lambda a, b: np.mean(np.power(y - (a * x + b), 2))


# Gradient
def grad_f(v_coefficients):
    a = v_coefficients[0, 0]
    b = v_coefficients[1, 0]
    return -np.array([np.mean(2 * (y - (a * x + b)) * x),
                     np.mean(2 * (y - (a * x + b)))]).reshape(2, 1)


# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
    tol = 1e-3
    # epsilon = np.array([tol * np.ones([2, 1], int)]) do not need this, due to numpy broadcasting rules
    alpha = 0.1
    v_next = v_prev - alpha * grad_f(v_prev)
    if (np.abs(v_next - v_prev) <= alpha).all():
        return v_next
    else:
        return gradient_decent(v_next)


# v_0 is the initial guess
v_0 = np.array([[1], [1]])


gradient_decent(v_0)

Upvotes: 1

Related Questions