Niels
Niels

Reputation: 151

Combining LOESS and Quantreg to caculate percentiles/quantiles for data

I am trying to calculate percentiles or quantils for data which considerably scatter. Using the Loess function the mean is nicely presented, however, I cannot get percentile/quantils from this function.

I tried to combine quantreg with loess. This plot shows linear curves instead of loess smoothed curves.

I would like to get a result similar to this: enter image description here

data(cars)
plot(cars)
lmodel <- loess(cars$dist~cars$speed,span = 0.3, degree = 1)
lpred<-predict(lmodel, newdata= 5:25,se=TRUE)
lines(5:25, lpred$fit,col='#000066',lwd=4)
lines(5:25, lpred$fit - qt(0.975, lpred$df)*lpred$se, lty=2)
lines(5:25, lpred$fit + qt(0.975, lpred$df)*lpred$se, lty=2)


#### combination of quantreg with loess

plot(cars$speed,cars$dist)
xx <- seq(min(cars$speed),max(cars$speed),1)
f <- coef(rq(loess(cars$dist~cars$speed,span = 0.3, degree = 1), tau=c(0.1,0.25,0.5,0.75,0.9)) )
yy <- cbind(1,xx)%*%f
for(i in 1:length(taus)){
  lines(xx,yy[,i],col = "gray")
}


I also tried the suggested code, however, I could not change the settings of the smoothing. The lines showed wavy path.

library(quantreg)
data(cars)
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9, degree = 1)
rqmodel <- rq(lmodel, tau = taus, data = cars)
f <- coef(rqmodel)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
yy <- predict(rqmodel)
plot(cars)
matlines(xx, yy, col = "grey",lwd=3)

enter image description here

The Loess function does not provide data for quantiles as the rg would.

However, the Loess functions allows to get a curve without zigzag. Please see the code snip. What would be the setting for tau=0.5 using the rg function to produce the same results compared with Loess function.

data(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9 )
plot(cars)
lines( x=4:25 , y=predict(lmodel, newdata= data.frame(speed=4:25)) ,col="Blue")

enter image description here

Upvotes: 1

Views: 860

Answers (2)

Jacob L Strunk
Jacob L Strunk

Reputation: 11

The code below (taken from an "answer") is not correct and should not be included in a correct solution. This would provide a 95% confidence interval on a fit, and the probability that interval lands on the true trend line. It does not correspond to a quantile computed from the data within the span of this moving average. A normal based approximation as recommended would require multiplying ls_yy$se.fit by sqrt(ni) where ni is the number of observations in the particular span. Unfortunately loess does not return ni, so this is not a tenable solution unless the span covers the entire dataset and ni can be set to n and there is no heteroskedasticity.

data(cars)
plot(cars)

lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)

#wrong - this does not denote quantiles for the input data:
ls_yy <- cbind(ls_yy$fit, 
               ls_yy$fit - 2*ls_yy$se.fit, 
               ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")

We can make this more obvious using a sample dataset with more observations. Samples 1 and 2 are identical, aside from their sample sizes (500 and 1500 observations), and therefore they should have very similar quantiles.

set.seed(1)
x1 = runif(500,0,10)
y1 = x1 + rnorm(length(x1))

x2 = runif(1500,0,10)
y2 = x1 + rnorm(length(x2))

dfpd = data.frame(x=1:9)

lmodel1 <- loess(y ~ x, data = data.frame(x=x1,y=y1), span = 0.5, degree = 1)
ls_yy1 <- predict(lmodel1, newdata=dfpd, se = TRUE)

lmodel2 <- loess(y ~ x, data = data.frame(x=x2,y=y2), span = 0.5, degree = 1)
ls_yy2 <- predict(lmodel2, newdata=dfpd, se = TRUE)

#the only difference between lmodel1 and lmodel2 is the number of observations
#the quantiles should be very similar, but their se values are a function of sample
#size and are thus quite different
ls_yy1$se
ls_yy2$se


ls_yy1$se / ls_yy2$se

We can see that the ratio of se values is around 60% which confirms that they cannot be used as-is for quantile calculations

ratio of se values

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76605

I believe the code in the question is mixing loess and quantile regressions when they are different methods and the latter does not need the former.

I will try to fit both and plot the respective results. In the code below I will use matlines, not a for loop.

These code lines are common.

library(quantreg)

data(cars)

xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))

First the loess model.

lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
ls_yy <- cbind(ls_yy$fit, 
               ls_yy$fit - 2*ls_yy$se.fit, 
               ls_yy$fit + 2*ls_yy$se.fit)

plot(cars)
matlines(xx, ls_yy, col = "darkgrey")

enter image description here

Now quantile regression.

taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
rqmodel <- rq(dist ~ speed, tau = taus, data = cars)

rq_yy <- predict(rqmodel)

plot(cars)
matlines(xx, rq_yy, col = "darkgrey")

enter image description here

Upvotes: 1

Related Questions