weighted regression sklearn

Question

I'd like to add weights to my training data based on its recency.

If we look at a simple example:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, normalize
from sklearn.linear_model import LinearRegression

X = np.array([1,2,3,4,5,6,7,8,9,10]).reshape(-1,1)
Y = np.array([0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 10]).reshape(-1,1)

poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, Y)

plt.scatter(X, Y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')

Now imagine that the X values are time-based and the Y value is a snapshot of a sensor. So we're modeling some behavior over time. I believe the newest data points are the most important as they are the most recent and most indicative of future behavior. I'd like to adjust my model such that the newest data points are weighted the highest.

There is a question about doing this in R: https://stats.stackexchange.com/questions/196653/assigning-more-weight-to-more-recent-observations-in-regression

I'm wondering if the sklearn package (or any other python packages) has this feature?

This weighted model would have a similar curve but would fit the newer points better. If I want to use this model to predict the future, the non-weighted models will always be too conservative in their prediction as they won't be as sensitive to the newest data.

Other than using this approach I've also used curve_fit to use a power function or exponential function:

from scipy.optimize import curve_fit

def func(x, a, b):
    return a*(x**b)

X = [1,2,3,4,5,6,7,8,9,10]
Y = [0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 10]

popt, pcov = curve_fit(func, X, Y, bounds=([-np.inf,1], [np.inf, np.inf]))
plt.plot(X, func(X, *popt), color = 'green')

If a solution using func and curve_fit is possible I'm open to that too, or any other methods. The only caveat is that my real-world data doesn't always imply the solution is a monotonically increasing function, but my ideal solution will be.

teoML · Accepted Answer

I took a look at sklearn's LinearRegression API here and I saw that the class has a fit() method which has the following signature: fit(self, X, y[, sample_weight]) So,you can actually give it a weight vector for your samples as far as I understand.

weighted regression sklearn

Answers (2)

Related Questions