Reputation: 1242
I have a Numpy 2D array in which the rows are individual time series and the columns correspond to the time points. I would like to fit a regression line to each of the rows to measure the trends of each time series, which I guess I could do (inefficiently) with a loop like:
array2D = ...
for row in array2D:
coeffs = sklearn.metrics.LinearRegression().fit( row, range( len( row ) ).coef_
...
Is there a way to do this without a loop? What is the resulting shape of coeffs
?
Upvotes: 1
Views: 4854
Reputation: 109
For those like me who prefer the range as X and the timeserie data as y.
def linear_fit(periods, timeseries):
# just format and add one column initialized at 1
X_mat=np.vstack((np.ones(len(periods)), periods)).T
# cf formula : linear-regression-using-matrix-multiplication
tmp = np.linalg.inv(X_mat.T.dot(X_mat)).dot(X_mat.T)
return tmp.dot(timeseries.T)[1] # 1 => coef_, 0 => intercept_
X = np.arange(8) # the periods, or the common range for all time series
y = np.array([ # time series
[0., 0., 0., 0., 0., 73.92, 0., 114.32],
[0., 0., 0., 0., 0., 73.92, 0., 114.32],
[0., 10., 20., 30., 40., 50., 60., 70.]
])
linear_fit(X, y)
[1]: array([12.16666667, 12.16666667, 10. ])
PS:This approach (linear regression with matrix multiplication) is a gold mine for large dataset
Upvotes: 2
Reputation: 2948
The coefficients that minimize the linear regression error are
You can solve for all rows in one go using numpy.
import numpy as np
from sklearn.linear_model import LinearRegression
def solve(timeseries):
n_samples = timeseries.shape[1]
# slope and offset/bias
n_features = 2
n_series = timeseries.shape[0]
# For a single time series, X would be of shape
# (n_samples, n_features) however in this case
# it will be (n_samples. n_features, n_series)
# The bias is added by having features being all 1's
X = np.ones((n_samples, n_features, n_series))
X[:, 1, :] = timeseries.T
y = np.arange(n_samples)
# A is the matrix to be inverted and will
# be of shape (n_series, n_features, n_features)
A = X.T @ X.transpose(2, 0, 1)
A_inv = np.linalg.inv(A)
# Do the other multiplications step by step
B = A_inv @ X.T
C = B @ y
# Return only the slopes (which is what .coef_ does in sklearn)
return C[:,1]
array2D = np.random.random((3,10))
coeffs_loop = np.empty(array2D.shape[0])
for i, row in enumerate(array2D):
coeffs = LinearRegression().fit( row[:,None], range( len( row ) )).coef_
coeffs_loop[i] = coeffs
coeffs_vectorized = solve(array2D)
print(np.allclose(coeffs_loop, coeffs_vectorized))
Upvotes: 3