Reputation: 473
I have a data frame with a grouping variable 'ID' and some values ('Value'):
dt <- data.frame(
ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
Value = c(4,3,1,3,4,6,6,1,8,4)
)
dt
# ID Value
# 1 A1 4
# 2 A2 3
# 3 A4 1
# 4 A2 3
# 5 A1 4
# 6 A4 6
# 7 A3 6
# 8 A2 1
# 9 A1 8
# 10 A3 4
I can calculate an overall rank order of the 'Value' column like this:
dt$Order <- rank(dt$Value, ties.method = "first")
dt
# ID Value Order
# 1 A1 4 5
# 2 A2 3 3
# 3 A4 1 1
# 4 A2 3 4
# 5 A1 4 6
# 6 A4 6 8
# 7 A3 6 9
# 8 A2 1 2
# 9 A1 8 10
# 10 A3 4 7
But how can I calculate rank order within each 'ID', instead of a global rank order?
# ID Value rnk
# 1 A1 4 1
# 2 A2 3 2
# 3 A4 1 1
# 4 A2 3 3
# 5 A1 4 2
# 6 A4 6 2
# 7 A3 6 2
# 8 A2 1 1
# 9 A1 8 3
# 10 A3 4 1
In T-SQL, we can get this done as the following syntax:
RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )
Any idea?
Upvotes: 14
Views: 18454
Reputation: 173737
Many options.
If performance is an issue (i.e. very large data), use the data.table package:
library(data.table)
setDT(dt)
# or: dt <- as.data.table(dt)
dt[ , Order := frank(Value, ties.method = "first"), by = ID]
# ID Value Order
# <char> <num> <int>
# 1: A1 4 1
# 2: A2 3 2
# 3: A4 1 1
# 4: A2 3 3
# 5: A1 4 2
# 6: A4 6 2
# 7: A3 6 2
# 8: A2 1 1
# 9: A1 8 3
# 10: A3 4 1
See ?frank
for several other ties methods, e.g. "dense"
dplyr
:
library(dplyr)
dt %>% group_by(ID) %>% mutate(rnk = row_number(Value))
From ?ranking
:
row_number()
: equivalent torank(ties.method = "first")
dplyr
has several other ranking functions, e.g. dense_rank
Or in all its gory detail, a base
R solution using split
, lapply
, do.call
and rbind
:
do.call(rbind, lapply(split(dt, dt$ID), transform,
Order = rank(Value, ties.method = "first")))
Using ddply
from the plyr package:
library(plyr)
ddply(dt, .(ID), transform, Order = rank(Value, ties.method = "first"))
ID Value Order
1 A1 4 1
2 A1 4 2
3 A1 8 3
4 A2 3 2
5 A2 3 3
6 A2 1 1
7 A3 6 2
8 A3 4 1
9 A4 1 1
10 A4 6 2
See revision history for data.table
alternatives for older package versions.
Upvotes: 14
Reputation: 270448
Here are a couple of approaches:
ave This takes each set of Value numbers that have the same ID and applies rank separately to each such set. No packages are used.
Rank <- function(x) rank(x, ties.method = "first")
transform(dt, rank = ave(Value, ID, FUN = Rank))
giving:
ID Value rank
1 A1 4 1
2 A2 3 2
3 A4 1 1
4 A2 3 3
5 A1 4 2
6 A4 6 2
7 A3 6 2
8 A2 1 1
9 A1 8 3
10 A3 4 1
Note that the above solution keeps the original row order. It could be sorted afterwards if that were desired.
sqldf with RPostgreSQL
# see FAQ #12 on the sqldf github home page for info on sqldf and PostgreSQL
# https://cran.r-project.org/web/packages/sqldf/README.html
library(RPostgreSQL)
library(sqldf)
sqldf('select
*,
rank() over (partition by "ID" order by "Value") rank
from "dt"
')
This solution reorders the rows. It is assumed that that is ok since your example solution did that (but if not append a sequence number column to dt
and add an appropriate order by clause to re-order the result back into the sequence number order).
Upvotes: 6
Reputation: 1
You can use the data.table package.
setDT(dt)
dt[, Order := rank(Value, ties.method = "first"), by = "ID"]
dt <- as.data.frame(dt)
giving the desired output:
ID Value Order
1 A1 4 1
2 A2 3 2
3 A4 1 1
4 A2 3 3
5 A1 4 2
6 A4 6 2
7 A3 6 2
8 A2 1 1
9 A1 8 3
10 A3 4 1
Upvotes: 0
Reputation: 110072
My way but there's likely better. Never used rank, din't even know about it. Thanks, may be useful.
#Your Data
dt <- data.frame(
ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
Value = c(4,3,1,3,4,6,6,1,8,4)
)
dt$Order <- rank(dt$Value,ties.method= "first")
#My approach
dt$id <- 1:nrow(dt) #needed for ordering and putting things back together
dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x,
ties.method = "first"))))
dt[order(dt$id), -4]
Yields:
ID Value Order Order.by.group
1 A1 4 5 1
2 A2 3 3 2
3 A4 1 1 1
4 A2 3 4 3
5 A1 4 6 2
6 A4 6 8 2
7 A3 6 9 2
8 A2 1 2 1
9 A1 8 10 3
10 A3 4 7 1
EDIT:
If you don't care about preserving the original order of the data then this works with less code:
dt <- dt[order(dt$ID),]
dt$Order.by.group <- unlist(with(dt, tapply(Value, ID, function(x) rank(x,
ties.method= "first"))))
ID Value Order.by.group
1 A1 4 1
5 A1 4 2
9 A1 8 3
2 A2 3 2
4 A2 3 3
8 A2 1 1
7 A3 6 2
10 A3 4 1
3 A4 1 1
6 A4 6 2
Upvotes: 4