Zlo
Zlo

Reputation: 1170

Determining the quartile of an observation in R

I have a list with multiple observations and their scores, so when I look up the observation e.g.,

Var_1["obs_50"]

it gives me the score

obs_50 
   12 

Is it possible to also know in which quartile the score of this particular observation lies?

Upvotes: 0

Views: 632

Answers (2)

Samuel Isaacson
Samuel Isaacson

Reputation: 365

You can use cut to discretize a vector, e.g.:

set.seed(11)
print(x <- rnorm(20))
## [1] -0.59103110  0.02659437 -1.51655310 -1.36265335  1.17848916 -0.93415132
## [7]  1.32360565  0.62491779 -0.04572296 -1.00412058 -0.82843324 -0.34835173
## [13] -1.53829340 -0.25556525 -1.14994503  0.01232697 -0.22296954  0.88777165
## [19] -0.59215528 -0.65571812
cut(x, breaks = quantile(x, seq(0, 1, by = 0.25)),
    include.lowest = TRUE, labels = FALSE)
## 2 4 1 1 4 2 4 4 3 1 2 3 1 3 1 3 3 4 2 2

If you don't want to discretize, you can also use rank:

rank(x) / length(x)
## [1] 0.50 0.80 0.10 0.15 0.95 0.30 1.00 0.85 0.70 0.25 0.35 0.55 0.05 0.60 0.20
## [16] 0.75 0.65 0.90 0.45 0.40

Upvotes: 1

IRTFM
IRTFM

Reputation: 263362

It would seem that you have a named vector. You would need to calculate the interquartile breaks with quantile and then figure out where your observation was sitting. The findInterval function is useful for the second part.

findInterval(Var_1["obs_50"], quantile(Var_1, c(0, .25, .5, .75, 1) )

I did like the idea of defining an ntile function as mentioned by @epid10 in his now deleted dplyr answer.

 # edited to fix missing parens and to now correct ntile for max value: 
 ntile <- function (obs, var, n_breaks) {
         findInterval(obs, 
                      quantile(var, seq(0,1,length=n_breaks+1 ),
                      right.closed=TRUE)) }

Upvotes: 2

Related Questions