Reputation: 2103
I am using statsmodels.formula.api
to preform linear regression. I have used three independent variables for prediction. In some cases I am getting negative value but all the output should be positive.
Is there any way to tell the model that the output can not be negative?
import statsmodels.formula.api as smf
output1 = smf.ols(formula= 'y ~A+B+C', data= data).fit()
output = output.predict(my_data)
Upvotes: 2
Views: 1487
Reputation: 22897
One standard way to model a positive or non-negative dependent (or response or output) variable is by assuming a exponential mean function.
The expected value of the response given the covariates is E(y | x) = exp(x b).
One way to model this is to use Poisson regression, either statsmodels Poisson or GLM with family Poisson. Given that Poisson will not be the correct likelihood for a continuous variable we need to adjust the covariance of the parameter estimates for the misspecification, with cov_type='HC0'
. That is we are using Quasi-Maximum Likelihood.
output1 = smf.poisson(formula= 'y ~A+B+C', data= data).fit(cov_type='HC0')
and alternative would be to log the response variable, which implicitly assumes a lognormal model.
http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ https://stats.stackexchange.com/questions/8505/poisson-regression-vs-log-count-least-squares-regression
Note, statsmodels does not impose that the response variable in Poisson, Binomial, Logit and similar are integers, so we can use those models for quasi-maximum likelihood estimation with continuous data.
Upvotes: 3
Reputation: 86
If you are trying to ensure that output values of your model are constrained within some bounds, linear regression is probably not an appropriate choice. It sounds like you might want logistic regression or some kind of model where the output falls within known bounds. Determining what kind of model you want might be a question for CrossValidated.
That being said, you can easily constrain your predictions after the fact - just set all the negative predictions to 0. Whether this makes any sense is a different question.
Upvotes: 1