sklearn linear regression for 2D array

Question

I have a Numpy 2D array in which the rows are individual time series and the columns correspond to the time points. I would like to fit a regression line to each of the rows to measure the trends of each time series, which I guess I could do (inefficiently) with a loop like:

array2D = ...
for row in array2D:
    coeffs = sklearn.metrics.LinearRegression().fit( row, range( len( row ) ).coef_
    ...

Is there a way to do this without a loop? What is the resulting shape of coeffs?

user2653663 · Accepted Answer

The coefficients that minimize the linear regression error are

You can solve for all rows in one go using numpy.

import numpy as np
from sklearn.linear_model import LinearRegression

def solve(timeseries):

    n_samples = timeseries.shape[1]
    # slope and offset/bias
    n_features = 2
    n_series = timeseries.shape[0]

    # For a single time series, X would be of shape
    # (n_samples, n_features) however in this case
    # it will be (n_samples. n_features, n_series)
    # The bias is added by having features being all 1's
    X = np.ones((n_samples, n_features, n_series))
    X[:, 1, :] = timeseries.T

    y = np.arange(n_samples)

    # A is the matrix to be inverted and will
    # be of shape (n_series, n_features, n_features)
    A = X.T @ X.transpose(2, 0, 1)
    A_inv = np.linalg.inv(A) 

    # Do the other multiplications step by step
    B = A_inv @ X.T
    C = B @ y 

    # Return only the slopes (which is what .coef_ does in sklearn)
    return C[:,1]

array2D = np.random.random((3,10))
coeffs_loop = np.empty(array2D.shape[0])
for i, row in enumerate(array2D):
    coeffs = LinearRegression().fit( row[:,None], range( len( row ) )).coef_
    coeffs_loop[i] = coeffs

coeffs_vectorized = solve(array2D)

print(np.allclose(coeffs_loop, coeffs_vectorized))

sklearn linear regression for 2D array

Answers (2)

Related Questions