colin
colin

Reputation: 2666

Running percentage least squares regression in R

I am interested in running a percentage least squares regression, rather than ordinary least squares regression in R. This could also be referred to as a linear model with multiplicative error. One question has been asked before regarding percentage least squares on this site, and responders suggested looking into weighted regression, with one possibility being weighting each observation by the inverse square of its X value.

stackoverflow.com/questions/15275236/least-square-percentage-regression

However, this assumes I know how much each observation should be weighted a priori. I don't. I don't know if the percent error is 1%, 10%, 15%, etc. What I want is a model fit as

y= b1*x + e

where the error term is modeled as:

e= b2*x

b2 would be the percentage error that needs to be minimized in the regression model. I have not been able to find any package or any code to fit a model of this type for R. any feedback on how to do this would be greatly appreciated.

Upvotes: 1

Views: 2058

Answers (2)

Roland
Roland

Reputation: 132969

I assume you mean percentage regression as defined by Tofallis (2009).

Using his example:

Sales <- c(6375,11626,14655,21869,26408,32406,35108,40295,70762,80553,95294,101314,116141,122316,141650,175026,230614,293543)
Expenses <- c(62.5,92.9,178.3,258.4,494.7,1083,1620.6,421.7,509.2,6620.1,3918.6,1595.3,6107.5,4454.1,3163.8,13210.7,1703.8,9528.2)

If we apply ordinary least squares with sales as the dependent variable we obtain the model Sales = 43942 + 15.00 R&D with p-values of 0.03 and 0.0015 for the intercept and slope respectively.

fit1 <- lm(Sales ~ Expenses)
summary(fit1)
#                Estimate Std. Error t value Pr(>|t|)   
# (Intercept)   43941.705  18493.079   2.376  0.03033 * 
#   Expenses       14.994      3.915   3.830  0.00148 **

If we do this and carry out ordinary least squares we obtain the model: Ln(Sales) = 10.341 + 0.000198 R&D with p-values of 0.002 for the slope and essentially zero for the intercept.

fit2 <- lm(log(Sales) ~ Expenses)  
summary(fit2)
#                Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   1.034e+01  2.535e-01  40.793  < 2e-16 ***
#   Expenses    1.982e-04  5.366e-05   3.694  0.00197 **

Finally, we turn to the approach presented in this paper, minimising the squared percentage residuals. The resulting model is found to be, after transforming back: Sales = 8817 + 17.88 R&D with p-values of 0.002 and 5×10-5 for the slope and intercept respectively.

fit3 <- lm(Sales ~ Expenses, weights = 1/Sales^2)
summary(fit3)
#               Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   8816.553   2421.644   3.641   0.0022 ** 
#   Expenses      17.880      3.236   5.525 4.61e-05 ***

So, in the end, this is weighted regression.

To confirm this, we can also use numeric optimization:

resfun <- function(par) {
  sum((Sales - par[[1]]*Expenses - par[[2]])^2 / Sales^2)
}

optim(c(10,1000), resfun)
# $par
# [1]   17.87838 8816.44304

optim(c(10,1000), resfun, method="BFGS")
# $par
# [1]   17.97975 8575.71156

(Different optimizers will give slightly different results.)

Upvotes: 4

Greg Snow
Greg Snow

Reputation: 49660

Look at the gls function in the nlme package, along with one of the varClasses such as varIdent or varPower.

Possibly a model like:

gls( y ~ x, data=mydata, weights=varPower(form= ~x) )

Upvotes: 0

Related Questions