Reputation: 28169
I'm trying to do a linear regression but I'm only looking to use variables with positive coefficients (I think this is called hard-thresholding, but I'm not certain).
for example:
> summary(lm1)
Call:
lm(formula = value ~ ., data = intCollect1[, -c(1, 3)])
Residuals:
Min 1Q Median 3Q Max
-15.6518 -0.2089 -0.0227 0.2035 15.2235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.099763 0.024360 4.095 4.22e-05 ***
modelNum3802 0.208867 0.008260 25.285 < 2e-16 ***
modelNum8000 -0.086258 0.013104 -6.582 4.65e-11 ***
modelNum8001 -0.058225 0.010741 -5.421 5.95e-08 ***
modelNum8002 -0.001813 0.012087 -0.150 0.880776
modelNum8003 -0.083646 0.011015 -7.594 3.13e-14 ***
modelNum8004 0.002521 0.010729 0.235 0.814254
modelNum8005 0.301286 0.011314 26.630 < 2e-16 ***
In the above regression, I would only want to use models 3802, 8004 and 8005. Is there a way to do this without copying and pasting each variable name?
Upvotes: 3
Views: 1233
Reputation: 282
You can also reformulate your linear regression model in the following way: label ~ sum(exp(\alpha_i) f_i)
the optimization target will be sum_j (label_j - sum_i(exp(\alpha_i) f_i))^2
This has no closed form solution but can be solved efficiently since it's convex in the \alpha_i's.
Once you compute the \alpha_i's, you can recast them as the regressors of a usual linear model by exponentiating them.
Upvotes: 0
Reputation: 28169
I found the nnls
(non-negative least square) package to be worth looking at.
Upvotes: 1
Reputation: 89097
Instead of using lm
, you can formulate your problem in terms of a Quadratic Programming:
Minimize the sum of the squared replication errors subject to the constraint that your linear coefficients are all positive.
Such problems can be solved using lsei
from the limSolve
package. Looking at your example, it would look a lot like this:
x.variables <- c("modelNum3802", "modelNum8000", ...)
num.var <- length(x.variables)
lsei(A = intCollect1[, x.variables],
B = intCollect1$value,
G = diag(num.var),
H = rep(0, num.var))
Upvotes: 3