Reputation: 9844
In my attempt to dig deeper in the math behind machine learning models, I'm implementing a Ordinary Least Square algorithm in Python, using vectorization. My references are:
This is what I have now:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
%matplotlib inline
X, y = datasets.load_diabetes(return_X_y=True)
# We only take the first feature (for visualization purposes).
X = X[:, np.newaxis, 2]
# Split the data into training/testing sets
X_train = X[:-20]
X_test = X[-20:]
y_train = y[:-20]
y_test = y[-20:]
# Input data
sns.scatterplot(
x=X_train[:, 0],
y=y_train,
label="train",
edgecolor=None,
color="blue"
)
# To predict
sns.scatterplot(
x=X_test[:, 0],
y=y_test,
label="test",
edgecolor=None,
marker="*",
color="red",
);
class LinearRegression:
"""
Ordinary least squares Linear Regression.
Args:
"""
def __init__(self, learning_rate: float = 0.01, tolerance: float = 1e4, standardize: bool = True):
# TODO: standardize if required
self._learning_rate: float = learning_rate
self._tolerance: float = tolerance
self._standardize: bool = standardize
self._fitted: bool = False
def fit(self, X: np.ndarray, y: np.ndarray) -> None:
"""Fit linear model."""
self._X: np.ndarray = X
self._y: np.ndarray = y[:, np.newaxis]
self._m, self._n = self._X.shape # rows, features
self._weights: np.ndarray = np.zeros((self._n, 1))
self._train()
def predict(self, X: np.ndarray, add_bias: bool = True) -> np.ndarray:
"""Predict using the linear model."""
assert self._fitted, "Model not fitted."
if add_bias:
X = np.c_[np.ones((X.shape[0], 1)), X]
predictions = np.dot(X, self._weights)
return predictions
def _train(self) -> None:
"""
Generate the clusters from the traning data.
Algorithm:
1. Initiliaze weights.
2. Compute the cost.
3. Calculate the gradient.
4. Update weights.
4. Repeat from 2 until convergence.
"""
# Add bias term
self._X = np.c_[np.ones((self._m, 1)), self._X]
self._weights = np.r_[np.ones((1, 1)), self._weights]
self._fitted = True
converged = False
iterations = 0
while not converged:
iterations += 1
y_hat = self.predict(self._X, add_bias=False)
residuals = self._residuals(self._y, y_hat)
gradients = self._gradients(self._X, residuals)
self._weights -= self._learning_rate * gradients
gradient_magnitude = np.linalg.norm(gradients)
print(gradient_magnitude)
if gradient_magnitude < self._tolerance:
converged = True
print(self._weights)
print(iterations)
def _residuals(self, y: np.ndarray, y_hat: np.ndarray) -> np.ndarray:
residuals = y - y_hat
return residuals
def _gradients(self, X: np.ndarray, residuals: np.ndarray) -> np.ndarray:
gradients = -2 * np.dot(X.T, residuals)
return gradients
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
clf = LinearRegression()
clf.fit(X_train, y_train)
The problem I'm facing is that my weights keep increasing until I end up getting a bunch of nans. I've been trying to find out what I'm missing, but so far no luck. Also tried to tweak the tolerance threshold, but I don't think that's the issue, but something wrong with my math.
Upvotes: 0
Views: 616
Reputation: 1879
Your code seems actually to work fine; except for learning rate, really! Just reduce it from 0.01
to e.g. 0.0001
and everything works fine (well, I would also reduce tolerance to something much much smaller, like 1e-5
, to make sure it actually converges to the right solution).
Small image showing that it works:
clf = LinearRegression(learning_rate=0.0001)
clf.fit(X_train, y_train)
b, m = clf._weights[:, 0]
plt.scatter(X_train[:, 0], y_train)
plt.plot([-2, 4], [-2 * m + b, 4 * m + b])
gives
Linear regression is a convex optimization problem, so you can imagine it like putting a ball on a parabola and then moving it towards the bottom by a fixed amount of space multiplied by the slope of the position you're at. If that "fixed amount" is small enough, you get closer and closer to the bottom, until you find the optimum position. But if you get the value too large, you jump from one side of the parabola to the other, and if it's large enough you land in a place which is actually higher than where you started from. Iterate this a few times and you get indeed in the exact situation you had...
Upvotes: 1