Reputation: 137
I have tried to use a toy problem of linear regression for implanting the optimisation on the MSE function using the algorithm of gradient decent.
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: 1 / len(x) * np.sum(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 10 ** -3
epsilon = np.array([tol * np.ones([2, 1], int)])
alpha = 0.2
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= epsilon).all():
return v_next
else:
gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)
I have tried different alpha values but the code never converges (infinite recursion) it seems that the issue is with the stop condition of the recursion, but after few runs the v_next and v_prev bounces between -infinte to infinite
Upvotes: 0
Views: 52
Reputation: 662
It's great that you are learning machine learning (^_^) by implementing some base algorithms by yourself. Regarding your question, there are two problems in your code, first one is mathematical, the sign in:
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
should be
return -np.array(...)
since
the second one is programming, this kind of code will not return you a result in Python:
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
add(new_x)
you must use return
in both clauses of the if
statement, so it should be
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
return add(new_x)
There is also a minor issue with the alpha coefficient for these particular data points alpha=0.2
is too big for algorithm to converge, you need to use smaller alpha
. I also slightly refactor your initial code using numpy broadcasting convention (https://numpy.org/doc/stable/user/basics.broadcasting.html) to get the following result:
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: np.mean(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return -np.array([np.mean(2 * (y - (a * x + b)) * x),
np.mean(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 1e-3
# epsilon = np.array([tol * np.ones([2, 1], int)]) do not need this, due to numpy broadcasting rules
alpha = 0.1
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= alpha).all():
return v_next
else:
return gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)
Upvotes: 1