Simple Linear Regression not converging

Question

In my attempt to dig deeper in the math behind machine learning models, I'm implementing a Ordinary Least Square algorithm in Python, using vectorization. My references are:

This is what I have now:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

%matplotlib inline

X, y = datasets.load_diabetes(return_X_y=True)

# We only take the first feature (for visualization purposes).
X = X[:, np.newaxis, 2]

# Split the data into training/testing sets
X_train = X[:-20]
X_test = X[-20:]
y_train = y[:-20]
y_test = y[-20:]

# Input data
sns.scatterplot(
    x=X_train[:, 0],
    y=y_train,
    label="train",
    edgecolor=None,
    color="blue"
)
# To predict
sns.scatterplot(
    x=X_test[:, 0],
    y=y_test,
    label="test",
    edgecolor=None,
    marker="*",
    color="red",
);

class LinearRegression:
    """
    Ordinary least squares Linear Regression.

    Args:
 
    """

    def __init__(self, learning_rate: float = 0.01, tolerance: float = 1e4, standardize: bool = True):
        # TODO: standardize if required
        self._learning_rate: float = learning_rate
        self._tolerance: float = tolerance
        self._standardize: bool = standardize
        self._fitted: bool = False
        

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        """Fit linear model."""
        self._X: np.ndarray = X
        self._y: np.ndarray = y[:, np.newaxis]
        self._m, self._n = self._X.shape  # rows, features
        self._weights: np.ndarray = np.zeros((self._n, 1))
            
        self._train()

    def predict(self, X: np.ndarray, add_bias: bool = True) -> np.ndarray:
        """Predict using the linear model."""
        assert self._fitted, "Model not fitted."
        if add_bias:
            X = np.c_[np.ones((X.shape[0], 1)), X]
        
        predictions = np.dot(X, self._weights)
        return predictions

    def _train(self) -> None:
        """
        Generate the clusters from the traning data.

        Algorithm:
            1. Initiliaze weights.
            2. Compute the cost.
            3. Calculate the gradient.
            4. Update weights.
            4. Repeat from 2 until convergence.
        """
        # Add bias term
        self._X = np.c_[np.ones((self._m, 1)), self._X]
        self._weights = np.r_[np.ones((1, 1)), self._weights]
        
        self._fitted = True
        
        converged = False
        iterations = 0
        while not converged:
            iterations += 1
            y_hat = self.predict(self._X, add_bias=False)
            residuals = self._residuals(self._y, y_hat)
            gradients = self._gradients(self._X, residuals)
            self._weights -= self._learning_rate * gradients
                                       
            gradient_magnitude = np.linalg.norm(gradients)
            print(gradient_magnitude)
            if gradient_magnitude < self._tolerance:
                converged = True
                
            print(self._weights)
            print(iterations)
    
    def _residuals(self, y: np.ndarray, y_hat: np.ndarray) -> np.ndarray:
        residuals = y - y_hat
        return residuals
    
    def _gradients(self, X: np.ndarray, residuals: np.ndarray) -> np.ndarray:
        gradients = -2 * np.dot(X.T, residuals)
        return gradients

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

clf = LinearRegression()
clf.fit(X_train, y_train)

The problem I'm facing is that my weights keep increasing until I end up getting a bunch of nans. I've been trying to find out what I'm missing, but so far no luck. Also tried to tweak the tolerance threshold, but I don't think that's the issue, but something wrong with my math.

Marco Spinaci · Accepted Answer

Your code seems actually to work fine; except for learning rate, really! Just reduce it from 0.01 to e.g. 0.0001 and everything works fine (well, I would also reduce tolerance to something much much smaller, like 1e-5, to make sure it actually converges to the right solution).

Small image showing that it works:

clf = LinearRegression(learning_rate=0.0001)
clf.fit(X_train, y_train)
b, m = clf._weights[:, 0]
plt.scatter(X_train[:, 0], y_train)
plt.plot([-2, 4], [-2 * m + b, 4 * m + b])

gives

Linear regression is a convex optimization problem, so you can imagine it like putting a ball on a parabola and then moving it towards the bottom by a fixed amount of space multiplied by the slope of the position you're at. If that "fixed amount" is small enough, you get closer and closer to the bottom, until you find the optimum position. But if you get the value too large, you jump from one side of the parabola to the other, and if it's large enough you land in a place which is actually higher than where you started from. Iterate this a few times and you get indeed in the exact situation you had...

Simple Linear Regression not converging

Answers (1)

Related Questions