Heather Clark
Heather Clark

Reputation: 175

Efficient way to create rank variables in R

I would like to create several ranking variables in my data frame. First, I would like the best way to rank.

Lets say I have data like this

grp<-c("sw","sw","sw","sl","sl","sl","sw","sl")
val<-c(12,2,6,4,9,15,6,4)
df<-cbind.data.frame(grp,val)

I want the data ranked so that there are no breaks in rank, but with ties averaged. So like this: (I sorted the data using: df[order(df$val),] )

  grp val rk
2  sw   2  1
4  sl   4  2.5
8  sl   4  2.5
3  sw   6  3.5
7  sw   6  3.5
5  sl   9  4
1  sw  12  5
6  sl  15  6

I know how to get no breaks in order (by using dense_rank) and how to get ties averaged (using rank), but not how to get both. Dense rank does not appear to have any arguments that would let you specify what to do with ties.

I would like something that I can apply easily to multiple columns if possible.

Upvotes: 2

Views: 221

Answers (2)

Iroha
Iroha

Reputation: 34761

In base R on pre-ordered data:

with(df, ave(cumsum(!duplicated(val)) , val, FUN = function(x) x + (length(x) > 1)/length(x)))

[1] 1.000000 2.333333 2.333333 2.333333 3.500000 3.500000 4.000000 5.000000 6.000000

Or the same idea using dplyr:

library(dplyr)

df %>%
  mutate(rk = dense_rank(val)) %>%
  group_by(val) %>%
  mutate(rk = rk + (n() > 1) / n())

# A tibble: 9 x 3
# Groups:   val [6]
  grp     val    rk
  <chr> <int> <dbl>
1 sw        2  1   
2 sl        4  2.33
3 sl        4  2.33
4 sl        4  2.33
5 sw        6  3.5 
6 sw        6  3.5 
7 sl        9  4   
8 sw       12  5   
9 sl       15  6 

Data (slightly altered to add more than a single duplicate):

df <- structure(list(grp = c("sw", "sl", "sl", "sl", "sw", "sw", "sl", 
                             "sw", "sl"), val = c(2L, 4L, 4L, 4L, 6L, 6L, 9L, 12L, 15L), rk = c(1, 
                                                                                                2.5, 2.5, 2.5, 3.5, 3.5, 4, 5, 6)), class = "data.frame", row.names = c("2", 
                                                                                                                                                                        "4", "9", "8", "3", "7", "5", "1", "6"))

Upvotes: 3

s_baldur
s_baldur

Reputation: 33753

Using data.table::frank():

library(data.table)
frank(df$val, ties.method = "dense") + frank(df$val) %% 1
# [1] 1.0 2.5 2.5 3.5 3.5 4.0 5.0 6.0

Another data.table alternative

setDT(df)
df[, rk := .GRP + if (.N > 1L) 0.5 else 0, by = val]
#    grp val  rk
# 1:  sw   2 1.0
# 2:  sl   4 2.5
# 3:  sl   4 2.5
# 4:  sw   6 3.5
# 5:  sw   6 3.5
# 6:  sl   9 4.0
# 7:  sw  12 5.0
# 8:  sl  15 6.0

Reproducible data:

df <- data.frame(
  grp = c("sw", "sl", "sl", "sw", "sw", "sl", "sw", "sl"),
  val = c(2L, 4L, 4L, 6L, 6L, 9L, 12L, 15L)
)

Upvotes: 3

Related Questions