Reputation: 175
I would like to create several ranking variables in my data frame. First, I would like the best way to rank.
Lets say I have data like this
grp<-c("sw","sw","sw","sl","sl","sl","sw","sl")
val<-c(12,2,6,4,9,15,6,4)
df<-cbind.data.frame(grp,val)
I want the data ranked so that there are no breaks in rank, but with ties averaged. So like this: (I sorted the data using: df[order(df$val),] )
grp val rk
2 sw 2 1
4 sl 4 2.5
8 sl 4 2.5
3 sw 6 3.5
7 sw 6 3.5
5 sl 9 4
1 sw 12 5
6 sl 15 6
I know how to get no breaks in order (by using dense_rank) and how to get ties averaged (using rank), but not how to get both. Dense rank does not appear to have any arguments that would let you specify what to do with ties.
I would like something that I can apply easily to multiple columns if possible.
Upvotes: 2
Views: 221
Reputation: 34761
In base R on pre-ordered data:
with(df, ave(cumsum(!duplicated(val)) , val, FUN = function(x) x + (length(x) > 1)/length(x)))
[1] 1.000000 2.333333 2.333333 2.333333 3.500000 3.500000 4.000000 5.000000 6.000000
Or the same idea using dplyr
:
library(dplyr)
df %>%
mutate(rk = dense_rank(val)) %>%
group_by(val) %>%
mutate(rk = rk + (n() > 1) / n())
# A tibble: 9 x 3
# Groups: val [6]
grp val rk
<chr> <int> <dbl>
1 sw 2 1
2 sl 4 2.33
3 sl 4 2.33
4 sl 4 2.33
5 sw 6 3.5
6 sw 6 3.5
7 sl 9 4
8 sw 12 5
9 sl 15 6
Data (slightly altered to add more than a single duplicate):
df <- structure(list(grp = c("sw", "sl", "sl", "sl", "sw", "sw", "sl",
"sw", "sl"), val = c(2L, 4L, 4L, 4L, 6L, 6L, 9L, 12L, 15L), rk = c(1,
2.5, 2.5, 2.5, 3.5, 3.5, 4, 5, 6)), class = "data.frame", row.names = c("2",
"4", "9", "8", "3", "7", "5", "1", "6"))
Upvotes: 3
Reputation: 33753
Using data.table::frank()
:
library(data.table)
frank(df$val, ties.method = "dense") + frank(df$val) %% 1
# [1] 1.0 2.5 2.5 3.5 3.5 4.0 5.0 6.0
Another data.table
alternative
setDT(df)
df[, rk := .GRP + if (.N > 1L) 0.5 else 0, by = val]
# grp val rk
# 1: sw 2 1.0
# 2: sl 4 2.5
# 3: sl 4 2.5
# 4: sw 6 3.5
# 5: sw 6 3.5
# 6: sl 9 4.0
# 7: sw 12 5.0
# 8: sl 15 6.0
Reproducible data:
df <- data.frame(
grp = c("sw", "sl", "sl", "sw", "sw", "sl", "sw", "sl"),
val = c(2L, 4L, 4L, 6L, 6L, 9L, 12L, 15L)
)
Upvotes: 3