RobinMin
RobinMin

Reputation: 473

Calculate rank by group

I have a data frame with a grouping variable 'ID' and some values ('Value'):

dt <- data.frame(
        ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
        Value = c(4,3,1,3,4,6,6,1,8,4)
    )
dt
#    ID Value
# 1  A1     4
# 2  A2     3
# 3  A4     1
# 4  A2     3
# 5  A1     4
# 6  A4     6
# 7  A3     6
# 8  A2     1
# 9  A1     8
# 10 A3     4

I can calculate an overall rank order of the 'Value' column like this:

dt$Order <- rank(dt$Value, ties.method = "first")
dt
#    ID Value Order
# 1  A1     4     5
# 2  A2     3     3
# 3  A4     1     1
# 4  A2     3     4
# 5  A1     4     6
# 6  A4     6     8
# 7  A3     6     9
# 8  A2     1     2
# 9  A1     8    10
# 10 A3     4     7

But how can I calculate rank order within each 'ID', instead of a global rank order?

#    ID Value  rnk
# 1  A1     4    1
# 2  A2     3    2
# 3  A4     1    1
# 4  A2     3    3
# 5  A1     4    2
# 6  A4     6    2
# 7  A3     6    2
# 8  A2     1    1
# 9  A1     8    3
# 10 A3     4    1

In T-SQL, we can get this done as the following syntax:

RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )

Any idea?

Upvotes: 14

Views: 18454

Answers (4)

joran
joran

Reputation: 173737

Many options.

If performance is an issue (i.e. very large data), use the data.table package:

library(data.table)
setDT(dt)
# or: dt <- as.data.table(dt)
dt[ , Order := frank(Value, ties.method = "first"), by = ID]

#          ID Value Order
#      <char> <num> <int>
#  1:     A1     4     1
#  2:     A2     3     2
#  3:     A4     1     1
#  4:     A2     3     3
#  5:     A1     4     2
#  6:     A4     6     2
#  7:     A3     6     2
#  8:     A2     1     1
#  9:     A1     8     3
# 10:     A3     4     1

See ?frank for several other ties methods, e.g. "dense"


dplyr:

library(dplyr)
dt %>% group_by(ID) %>% mutate(rnk = row_number(Value))

From ?ranking:

row_number(): equivalent to rank(ties.method = "first")

dplyr has several other ranking functions, e.g. dense_rank


Or in all its gory detail, a base R solution using split, lapply, do.call and rbind:

do.call(rbind, lapply(split(dt, dt$ID), transform,
              Order = rank(Value, ties.method = "first")))

Using ddply from the plyr package:

library(plyr)
ddply(dt, .(ID), transform, Order = rank(Value, ties.method = "first"))
   ID Value Order
1  A1     4     1
2  A1     4     2
3  A1     8     3
4  A2     3     2
5  A2     3     3
6  A2     1     1
7  A3     6     2
8  A3     4     1
9  A4     1     1
10 A4     6     2

See revision history for data.table alternatives for older package versions.

Upvotes: 14

G. Grothendieck
G. Grothendieck

Reputation: 270448

Here are a couple of approaches:

ave This takes each set of Value numbers that have the same ID and applies rank separately to each such set. No packages are used.

Rank <- function(x) rank(x, ties.method = "first")
transform(dt, rank = ave(Value, ID, FUN = Rank))

giving:

   ID Value rank
1  A1     4    1
2  A2     3    2
3  A4     1    1
4  A2     3    3
5  A1     4    2
6  A4     6    2
7  A3     6    2
8  A2     1    1
9  A1     8    3
10 A3     4    1

Note that the above solution keeps the original row order. It could be sorted afterwards if that were desired.

sqldf with RPostgreSQL

# see FAQ #12 on the sqldf github home page for info on sqldf and PostgreSQL
# https://cran.r-project.org/web/packages/sqldf/README.html

library(RPostgreSQL)
library(sqldf)

sqldf('select 
          *, 
          rank() over (partition by "ID" order by "Value") rank 
       from "dt"
')

This solution reorders the rows. It is assumed that that is ok since your example solution did that (but if not append a sequence number column to dt and add an appropriate order by clause to re-order the result back into the sequence number order).

Upvotes: 6

User16
User16

Reputation: 1

You can use the data.table package.

setDT(dt) dt[, Order := rank(Value, ties.method = "first"), by = "ID"] dt <- as.data.frame(dt)

giving the desired output:

   ID Value Order
1  A1     4     1
2  A2     3     2
3  A4     1     1
4  A2     3     3
5  A1     4     2
6  A4     6     2
7  A3     6     2
8  A2     1     1
9  A1     8     3
10 A3     4     1

Upvotes: 0

Tyler Rinker
Tyler Rinker

Reputation: 110072

My way but there's likely better. Never used rank, din't even know about it. Thanks, may be useful.

#Your Data
dt <- data.frame(
    ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
    Value = c(4,3,1,3,4,6,6,1,8,4)
)
dt$Order <- rank(dt$Value,ties.method= "first")

#My approach
dt$id <- 1:nrow(dt) #needed for ordering and putting things back together
dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x, 
    ties.method = "first"))))
dt[order(dt$id), -4]

Yields:

   ID Value Order Order.by.group
1  A1     4     5              1
2  A2     3     3              2
3  A4     1     1              1
4  A2     3     4              3
5  A1     4     6              2
6  A4     6     8              2
7  A3     6     9              2
8  A2     1     2              1
9  A1     8    10              3
10 A3     4     7              1

EDIT:

If you don't care about preserving the original order of the data then this works with less code:

dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x, 
   ties.method= "first"))))

   ID Value Order.by.group
1  A1     4              1
5  A1     4              2
9  A1     8              3
2  A2     3              2
4  A2     3              3
8  A2     1              1
7  A3     6              2
10 A3     4              1
3  A4     1              1
6  A4     6              2

Upvotes: 4

Related Questions