Christian Million
Christian Million

Reputation: 692

Calculate Ranks for Each Group, but counting tie's as 1

Following up from this post: Calculate ranks for each group

df <- ddply(df, .(type), transform, pos = rank(x, ties.method = "min")-1)

Using the method described in the above post, when you you have multiple ties across the same TYPE, the ranking output (Pos) gets a little messy and hard to interpret, though technically still an accurate output.

For example:

library(plyr)
df <- data.frame(type = c(rep("a",11), rep("b",6), rep("c",2), rep("d", 6)), 
                    x = c(50:53, rep(54, 3), 55:56, rep(57, 2), rep(51,3), rep(52,2), 56,
                          53, 57, rep(52, 2), 54, rep(58, 2), 70))
df<-ddply(df,.(type),transform, pos=rank(x,ties.method="min")-1)

Produces:

Type    X    Pos
a       50   0
a       51   1
a       52   2
a       53   3
a       54   4
a       54   4
a       54   4
a       55   7
a       56   8
a       57   9
a       57   9
b       51   0
b       51   0
b       51   0
b       52   3
b       52   3
b       56   5
c       53   0
c       57   1
d       52   0
d       52   0
d       54   2
d       58   3
d       58   3
d       70   5

The Pos relative ranking is correct (equal values are ranked the same, lower values ranked lower, and higher values ranked higher), but I have been trying to make the output look prettier. Any thoughts?

I'd like to get the output to look like this:

Type    X    Pos
a       50   1
a       51   2
a       52   3
a       53   4
a       54   5
a       54   5
a       54   5
a       55   6
a       56   7
a       57   8
a       57   8
b       51   1
b       51   1
b       51   1
b       52   2
b       52   2
b       56   3
c       53   1
c       57   2
d       52   1
d       52   1
d       54   2
d       58   3
d       58   3
d       70   4

This format, of course, assumes that the total number of records for each group doesn't matter. By taking away the "-1", we can remove the 0's, but that only solves one aspect. I've tried playing around with different equations and ties.method's, but to no avail.

Maybe the rank() function isn't what I should be using?

Upvotes: 1

Views: 231

Answers (1)

mt1022
mt1022

Reputation: 17299

It seems you are looking for dense-rank:

as.data.table(df)[, pos := frank(x, ties.method = 'dense'), by = 'type'][]
#     type  x pos
# 1:    a 50   1
# 2:    a 51   2
# 3:    a 52   3
# 4:    a 53   4
# 5:    a 54   5
# 6:    a 54   5
# 7:    a 54   5
# 8:    a 55   6
# 9:    a 56   7
# 10:    a 57   8
# 11:    a 57   8
# 12:    b 51   1
# 13:    b 51   1
# 14:    b 51   1
# 15:    b 52   2
# 16:    b 52   2
# 17:    b 56   3
# 18:    c 53   1
# 19:    c 57   2
# 20:    d 52   1
# 21:    d 52   1
# 22:    d 54   2
# 23:    d 58   3
# 24:    d 58   3
# 25:    d 70   4
# type  x pos

dens_rank in dplyr does the same thing:

library(dplyr)
df %>% group_by(type) %>% mutate(pos = dense_rank(x)) %>% ungroup()
# # A tibble: 25 x 3
#      type     x   pos
#    <fctr> <dbl> <int>
#  1      a    50     1
#  2      a    51     2
#  3      a    52     3
#  4      a    53     4
#  5      a    54     5
#  6      a    54     5
#  7      a    54     5
#  8      a    55     6
#  9      a    56     7
# 10      a    57     8
# # ... with 15 more rows

Upvotes: 0

Related Questions