Efficient way to create rank variables in R

Question

I would like to create several ranking variables in my data frame. First, I would like the best way to rank.

Lets say I have data like this

grp<-c("sw","sw","sw","sl","sl","sl","sw","sl")
val<-c(12,2,6,4,9,15,6,4)
df<-cbind.data.frame(grp,val)

I want the data ranked so that there are no breaks in rank, but with ties averaged. So like this: (I sorted the data using: df[order(df$val),] )

  grp val rk
2  sw   2  1
4  sl   4  2.5
8  sl   4  2.5
3  sw   6  3.5
7  sw   6  3.5
5  sl   9  4
1  sw  12  5
6  sl  15  6

I know how to get no breaks in order (by using dense_rank) and how to get ties averaged (using rank), but not how to get both. Dense rank does not appear to have any arguments that would let you specify what to do with ties.

I would like something that I can apply easily to multiple columns if possible.

Iroha · Accepted Answer

In base R on pre-ordered data:

with(df, ave(cumsum(!duplicated(val)) , val, FUN = function(x) x + (length(x) > 1)/length(x)))

[1] 1.000000 2.333333 2.333333 2.333333 3.500000 3.500000 4.000000 5.000000 6.000000

Or the same idea using dplyr:

library(dplyr)

df %>%
  mutate(rk = dense_rank(val)) %>%
  group_by(val) %>%
  mutate(rk = rk + (n() > 1) / n())

# A tibble: 9 x 3
# Groups:   val [6]
  grp     val    rk
    
1 sw        2  1   
2 sl        4  2.33
3 sl        4  2.33
4 sl        4  2.33
5 sw        6  3.5 
6 sw        6  3.5 
7 sl        9  4   
8 sw       12  5   
9 sl       15  6

Data (slightly altered to add more than a single duplicate):

df <- structure(list(grp = c("sw", "sl", "sl", "sl", "sw", "sw", "sl", 
                             "sw", "sl"), val = c(2L, 4L, 4L, 4L, 6L, 6L, 9L, 12L, 15L), rk = c(1, 
                                                                                                2.5, 2.5, 2.5, 3.5, 3.5, 4, 5, 6)), class = "data.frame", row.names = c("2", 
                                                                                                                                                                        "4", "9", "8", "3", "7", "5", "1", "6"))

Efficient way to create rank variables in R

Answers (2)

Related Questions