ECII
ECII

Reputation: 10619

Split observations in half

In a plot(x,y) is there any way to plot a line/curve/function that would split **at every x (see DWins comment) ** the observations in 2 halfs? So that **at arround every x (see DWins comment) ** the same number of observations are above and below this line/curve/function? Is there any way to get the (x,y) coordinates or the function of this line/curve/function?

As regressing the data is problematic due to outliers/non-normality etc etc, i though a programming method might provide a viable solution without resorting to complicated regression methods.

Thanks a lot

Upvotes: 1

Views: 247

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269451

First generate some test data:

x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3)
y <- seq_along(x)

Now assuming the data is sorted by x calculate the median at each x and plot:

plot(y ~ x)

m <- tapply(y, x, median)
lines(m ~ unique(x))

Upvotes: 4

IRTFM
IRTFM

Reputation: 263301

Implementing Bolker's idea is really quite simple. This is just plotting the results of the first example in package quantreg's rq function

require(quantreg)
 data(stackloss); fit <- rq(stack.loss ~ Air.Flow, .5, data=stackloss)
 with(stackloss,   plot(Air.Flow, stack.loss))
 abline(a=coef(fit)[1], b=coef(fit)[2])

However that is not an "at every point" solution, so consider this loess approach:

fit <-loess(stack.loss ~ Air.Flow, data=stackloss, family="symmetric")
plot(stack.loss ~ Air.Flow, data=stackloss)
with(stackloss, lines(sort(unique(Air.Flow)),  
                      predict(fit, data.frame(Air.Flow=sort(unique(Air.Flow))))))

It doesn't do well at the x vlaues where there is only one value but it seems to hit pretty close to the median when using the family="symmetric" option.

Upvotes: 2

Related Questions