Reputation: 151
I am trying to calculate percentiles or quantils for data which considerably scatter. Using the Loess function the mean is nicely presented, however, I cannot get percentile/quantils from this function.
I tried to combine quantreg with loess. This plot shows linear curves instead of loess smoothed curves.
I would like to get a result similar to this:
data(cars)
plot(cars)
lmodel <- loess(cars$dist~cars$speed,span = 0.3, degree = 1)
lpred<-predict(lmodel, newdata= 5:25,se=TRUE)
lines(5:25, lpred$fit,col='#000066',lwd=4)
lines(5:25, lpred$fit - qt(0.975, lpred$df)*lpred$se, lty=2)
lines(5:25, lpred$fit + qt(0.975, lpred$df)*lpred$se, lty=2)
#### combination of quantreg with loess
plot(cars$speed,cars$dist)
xx <- seq(min(cars$speed),max(cars$speed),1)
f <- coef(rq(loess(cars$dist~cars$speed,span = 0.3, degree = 1), tau=c(0.1,0.25,0.5,0.75,0.9)) )
yy <- cbind(1,xx)%*%f
for(i in 1:length(taus)){
lines(xx,yy[,i],col = "gray")
}
I also tried the suggested code, however, I could not change the settings of the smoothing. The lines showed wavy path.
library(quantreg)
data(cars)
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9, degree = 1)
rqmodel <- rq(lmodel, tau = taus, data = cars)
f <- coef(rqmodel)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
yy <- predict(rqmodel)
plot(cars)
matlines(xx, yy, col = "grey",lwd=3)
The Loess function does not provide data for quantiles as the rg would.
However, the Loess functions allows to get a curve without zigzag. Please see the code snip. What would be the setting for tau=0.5 using the rg function to produce the same results compared with Loess function.
data(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.9 )
plot(cars)
lines( x=4:25 , y=predict(lmodel, newdata= data.frame(speed=4:25)) ,col="Blue")
Upvotes: 1
Views: 860
Reputation: 11
The code below (taken from an "answer") is not correct and should not be included in a correct solution. This would provide a 95% confidence interval on a fit, and the probability that interval lands on the true trend line. It does not correspond to a quantile computed from the data within the span of this moving average. A normal based approximation as recommended would require multiplying ls_yy$se.fit by sqrt(ni) where ni is the number of observations in the particular span. Unfortunately loess does not return ni, so this is not a tenable solution unless the span covers the entire dataset and ni can be set to n and there is no heteroskedasticity.
data(cars)
plot(cars)
lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
#wrong - this does not denote quantiles for the input data:
ls_yy <- cbind(ls_yy$fit,
ls_yy$fit - 2*ls_yy$se.fit,
ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")
We can make this more obvious using a sample dataset with more observations. Samples 1 and 2 are identical, aside from their sample sizes (500 and 1500 observations), and therefore they should have very similar quantiles.
set.seed(1)
x1 = runif(500,0,10)
y1 = x1 + rnorm(length(x1))
x2 = runif(1500,0,10)
y2 = x1 + rnorm(length(x2))
dfpd = data.frame(x=1:9)
lmodel1 <- loess(y ~ x, data = data.frame(x=x1,y=y1), span = 0.5, degree = 1)
ls_yy1 <- predict(lmodel1, newdata=dfpd, se = TRUE)
lmodel2 <- loess(y ~ x, data = data.frame(x=x2,y=y2), span = 0.5, degree = 1)
ls_yy2 <- predict(lmodel2, newdata=dfpd, se = TRUE)
#the only difference between lmodel1 and lmodel2 is the number of observations
#the quantiles should be very similar, but their se values are a function of sample
#size and are thus quite different
ls_yy1$se
ls_yy2$se
ls_yy1$se / ls_yy2$se
We can see that the ratio of se values is around 60% which confirms that they cannot be used as-is for quantile calculations
Upvotes: 1
Reputation: 76605
I believe the code in the question is mixing loess and quantile regressions when they are different methods and the latter does not need the former.
I will try to fit both and plot the respective results.
In the code below I will use matlines
, not a for
loop.
These code lines are common.
library(quantreg)
data(cars)
xx <- seq(min(cars$speed), max(cars$speed), length.out = nrow(cars))
First the loess model.
lmodel <- loess(dist ~ speed, data = cars, span = 0.5, degree = 1)
ls_yy <- predict(lmodel, se = TRUE)
ls_yy <- cbind(ls_yy$fit,
ls_yy$fit - 2*ls_yy$se.fit,
ls_yy$fit + 2*ls_yy$se.fit)
plot(cars)
matlines(xx, ls_yy, col = "darkgrey")
Now quantile regression.
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
rqmodel <- rq(dist ~ speed, tau = taus, data = cars)
rq_yy <- predict(rqmodel)
plot(cars)
matlines(xx, rq_yy, col = "darkgrey")
Upvotes: 1