Reputation: 2274
I'm using the reference for sklearn here http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html but there is no option to constrain the regression coefficients.
Does anyone know of another package in python to perform multiple variable linear regression and constrain the regression coefficients to be greater than 0?
Here is the code I have so far.
'''data:
date A B C
10/30/2015 0.063363323 -0.005218807 0.079777558
11/30/2015 -0.013171244 -0.008727599 0.010352028
12/31/2015 -0.017551268 8.09E-05 -0.020491923
1/29/2016 -0.042606469 0.052272139 -0.080362246
2/29/2016 -0.015224562 0.031250961 0.029988488
3/31/2016 0.058291876 -0.000238614 0.056727336
4/29/2016 0.000505675 -0.005325338 0.02854057
5/31/2016 0.012766515 0.008548162 -0.001631845
6/30/2016 -0.038981203 0.064236963 0.00570145
7/29/2016 0.033715429 0.024269606 0.02703294
8/31/2016 -0.002083837 -0.009439625 0.004129397
9/30/2016 -0.009825674 -0.01737909 -0.019251885
11/30/2016 0.0084733 -0.11668582 0.031928726
12/30/2016 0.017084282 -0.005553088 0.029372131
1/31/2017 0.014263947 0.004036504 0.00187079
2/28/2017 0.037375566 0.016081105 0.039331615
3/31/2017 -0.002494984 -0.005942793 -0.002097504
4/28/2017 -0.005054922 0.015685226 0.008243977
5/31/2017 0.002285393 0.020771375 0.002697755
6/30/2017 0.002841457 0.004886117 0.019202011
7/31/2017 0.014866638 -0.006900926 0.010126577
8/31/2017 -0.016647997 0.035687133 -0.008709075
9/29/2017 0.019523651 -0.022154361 0.020468398
10/31/2017 0.019407629 -0.000705663 0.016574416
11/30/2017 0.027486425 0.008008173 0.033427299
12/29/2017 0.007861222 0.018095096 0.017908809
1/31/2018 0.058702838 -0.032765285 0.05
'''
reg = linear_model.LinearRegression(fit_intercept=False)
reg.fit(df[['B', 'C']], df['A'])
print(reg.coef_)
# [ 0.67761268 -0.08845756]
Working code below
from scipy.optimize import lsq_linear
lb = 0
ub = np.Inf
res = lsq_linear(df[['B', 'C']],
df['A'],
bounds=(lb, ub))
print(res.x)
Upvotes: 2
Views: 2593
Reputation: 962
For people arriving at this answer in 2021, you can now set positive = True
when calling LinearRegression
so that the coefficients are constrained to be > 0.
Upvotes: 2
Reputation: 33532
sklearn is just wrapping scipy's lstsq which does not support this.
You can easily modify sklearn's code though:
if sp.issparse(X):
if y.ndim < 2:
out = sparse_lsqr(X, y)
self.coef_ = out[0]
self._residues = out[3]
else:
# sparse_lstsq cannot handle y with shape (M, K)
outs = Parallel(n_jobs=n_jobs_)(
delayed(sparse_lsqr)(X, y[:, j].ravel())
for j in range(y.shape[1]))
self.coef_ = np.vstack(out[0] for out in outs)
self._residues = np.vstack(out[3] for out in outs)
else:
self.coef_, self._residues, self.rank_, self.singular_ = \
linalg.lstsq(X, y)
self.coef_ = self.coef_.T
Just replace lstsq / lsqr with scipy's nnls (dense!!!) or lsq_linear with manually-set bounds (for large-scale: optimize.minimize with method lbfgs is another candidate although you need to prepare the gradient and there are at least two different common approaches: e.g. using pre-computed: A.T*A
which loses sparseness).
Remark: those methods are minimizing different functions (norm vs. squared norm; 0.5 factor vs. 1.0 factor). This does not change the result in terms of the vector found, but the objective of course looks different and you should take care of this (if needed).
Upvotes: 3