linear regression in statsmodel.formula.api python

Question

I am using statsmodels.formula.api to preform linear regression. I have used three independent variables for prediction. In some cases I am getting negative value but all the output should be positive.

Is there any way to tell the model that the output can not be negative?

import statsmodels.formula.api as smf

output1 = smf.ols(formula= 'y ~A+B+C', data= data).fit()
output = output.predict(my_data)

Josef · Accepted Answer

One standard way to model a positive or non-negative dependent (or response or output) variable is by assuming a exponential mean function.

The expected value of the response given the covariates is E(y | x) = exp(x b).

One way to model this is to use Poisson regression, either statsmodels Poisson or GLM with family Poisson. Given that Poisson will not be the correct likelihood for a continuous variable we need to adjust the covariance of the parameter estimates for the misspecification, with cov_type='HC0'. That is we are using Quasi-Maximum Likelihood.

output1 = smf.poisson(formula= 'y ~A+B+C', data= data).fit(cov_type='HC0')

and alternative would be to log the response variable, which implicitly assumes a lognormal model.

http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ https://stats.stackexchange.com/questions/8505/poisson-regression-vs-log-count-least-squares-regression

Note, statsmodels does not impose that the response variable in Poisson, Binomial, Logit and similar are integers, so we can use those models for quasi-maximum likelihood estimation with continuous data.

linear regression in statsmodel.formula.api python

Answers (2)

Related Questions