bikuser
bikuser

Reputation: 2103

linear regression in statsmodel.formula.api python

I am using statsmodels.formula.api to preform linear regression. I have used three independent variables for prediction. In some cases I am getting negative value but all the output should be positive.

Is there any way to tell the model that the output can not be negative?

import statsmodels.formula.api as smf

output1 = smf.ols(formula= 'y ~A+B+C', data= data).fit()
output = output.predict(my_data)

Upvotes: 2

Views: 1487

Answers (2)

Josef
Josef

Reputation: 22897

One standard way to model a positive or non-negative dependent (or response or output) variable is by assuming a exponential mean function.

The expected value of the response given the covariates is E(y | x) = exp(x b).

One way to model this is to use Poisson regression, either statsmodels Poisson or GLM with family Poisson. Given that Poisson will not be the correct likelihood for a continuous variable we need to adjust the covariance of the parameter estimates for the misspecification, with cov_type='HC0'. That is we are using Quasi-Maximum Likelihood.

output1 = smf.poisson(formula= 'y ~A+B+C', data= data).fit(cov_type='HC0')

and alternative would be to log the response variable, which implicitly assumes a lognormal model.

http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ https://stats.stackexchange.com/questions/8505/poisson-regression-vs-log-count-least-squares-regression

Note, statsmodels does not impose that the response variable in Poisson, Binomial, Logit and similar are integers, so we can use those models for quasi-maximum likelihood estimation with continuous data.

Upvotes: 3

langelgjm
langelgjm

Reputation: 86

If you are trying to ensure that output values of your model are constrained within some bounds, linear regression is probably not an appropriate choice. It sounds like you might want logistic regression or some kind of model where the output falls within known bounds. Determining what kind of model you want might be a question for CrossValidated.

That being said, you can easily constrain your predictions after the fact - just set all the negative predictions to 0. Whether this makes any sense is a different question.

Upvotes: 1

Related Questions