Reputation: 692
Following up from this post: Calculate ranks for each group
df <- ddply(df, .(type), transform, pos = rank(x, ties.method = "min")-1)
Using the method described in the above post, when you you have multiple ties across the same TYPE, the ranking output (Pos) gets a little messy and hard to interpret, though technically still an accurate output.
For example:
library(plyr)
df <- data.frame(type = c(rep("a",11), rep("b",6), rep("c",2), rep("d", 6)),
x = c(50:53, rep(54, 3), 55:56, rep(57, 2), rep(51,3), rep(52,2), 56,
53, 57, rep(52, 2), 54, rep(58, 2), 70))
df<-ddply(df,.(type),transform, pos=rank(x,ties.method="min")-1)
Produces:
Type X Pos
a 50 0
a 51 1
a 52 2
a 53 3
a 54 4
a 54 4
a 54 4
a 55 7
a 56 8
a 57 9
a 57 9
b 51 0
b 51 0
b 51 0
b 52 3
b 52 3
b 56 5
c 53 0
c 57 1
d 52 0
d 52 0
d 54 2
d 58 3
d 58 3
d 70 5
The Pos relative ranking is correct (equal values are ranked the same, lower values ranked lower, and higher values ranked higher), but I have been trying to make the output look prettier. Any thoughts?
I'd like to get the output to look like this:
Type X Pos
a 50 1
a 51 2
a 52 3
a 53 4
a 54 5
a 54 5
a 54 5
a 55 6
a 56 7
a 57 8
a 57 8
b 51 1
b 51 1
b 51 1
b 52 2
b 52 2
b 56 3
c 53 1
c 57 2
d 52 1
d 52 1
d 54 2
d 58 3
d 58 3
d 70 4
This format, of course, assumes that the total number of records for each group doesn't matter. By taking away the "-1", we can remove the 0's, but that only solves one aspect. I've tried playing around with different equations and ties.method's, but to no avail.
Maybe the rank() function isn't what I should be using?
Upvotes: 1
Views: 231
Reputation: 17299
It seems you are looking for dense-rank:
as.data.table(df)[, pos := frank(x, ties.method = 'dense'), by = 'type'][]
# type x pos
# 1: a 50 1
# 2: a 51 2
# 3: a 52 3
# 4: a 53 4
# 5: a 54 5
# 6: a 54 5
# 7: a 54 5
# 8: a 55 6
# 9: a 56 7
# 10: a 57 8
# 11: a 57 8
# 12: b 51 1
# 13: b 51 1
# 14: b 51 1
# 15: b 52 2
# 16: b 52 2
# 17: b 56 3
# 18: c 53 1
# 19: c 57 2
# 20: d 52 1
# 21: d 52 1
# 22: d 54 2
# 23: d 58 3
# 24: d 58 3
# 25: d 70 4
# type x pos
dens_rank
in dplyr
does the same thing:
library(dplyr)
df %>% group_by(type) %>% mutate(pos = dense_rank(x)) %>% ungroup()
# # A tibble: 25 x 3
# type x pos
# <fctr> <dbl> <int>
# 1 a 50 1
# 2 a 51 2
# 3 a 52 3
# 4 a 53 4
# 5 a 54 5
# 6 a 54 5
# 7 a 54 5
# 8 a 55 6
# 9 a 56 7
# 10 a 57 8
# # ... with 15 more rows
Upvotes: 0