Reputation: 2663
I have a question I can't figure out, which I'm almost certain involves rank
. Let's say that I have a df
in wide form with 3 variables with integer values.
id var1 var2 var3
1 23 8 30
2 1 2 3
3 4 5 1
4 100 80 60
I'd like to create three new variables with the rank of the values for var1
, var2
, and var3
from largest to smallest. For example,
id var1 var2 var3 var1_rank var2_rank var3_rank
1 23 8 30 2 3 1
2 1 2 3 3 2 1
3 4 5 1 2 1 3
4 100 80 60 1 2 3
How would I go about doing this? Thanks!
Upvotes: 1
Views: 2905
Reputation: 93938
Get the example data:
test <- read.table(text="id var1 var2 var3
1 23 8 30
2 1 2 3
3 4 5 1
4 100 80 60",header=TRUE)
Get the ranks part and rename appropriately (notice the -x
to reverse the rank so it relates to decreasing instead of increasing size - this will be generalisable to any size of data.frame
used as input):
ranks <- t(apply(test[,-1], 1, function(x) rank(-x) ))
colnames(ranks) <- paste(colnames(ranks), "_rank", sep="")
Join with the old data frame.
data.frame(test, ranks)
Result:
> data.frame(test,ranks)
id var1 var2 var3 var1_rank var2_rank var3_rank
1 1 23 8 30 2 3 1
2 2 1 2 3 3 2 1
3 3 4 5 1 2 1 3
4 4 100 80 60 1 2 3
To get to @mnel's answer using base R, you could also do something like:
testres <- data.frame(test["id"],stack(test[2:4]))
testres$rank <- ave(testres$values,testres$id,FUN=function(x) rank(-x) )
> testres
id values ind rank
1 1 23 var1 2
2 2 1 var1 3
3 3 4 var1 2
4 4 100 var1 1
5 1 8 var2 3
6 2 2 var2 2
7 3 5 var2 1
8 4 80 var2 2
9 1 30 var3 1
10 2 3 var3 1
11 3 1 var3 3
12 4 60 var3 3
Upvotes: 3
Reputation: 115515
I think it is easier to work in long
format (and more memory efficient, as apply
will coerce to a matrix
. Here is an approach using reshape
and data.table
library(data.table)
tlong <- reshape(data.table(test), direction ='long', varying = list(2:4),
times = paste0('var',1:3), v.names = 'value')
# calculate the rank within each `id`
tlong[, rank := rank(-value), by = id]
tlong
## id time value rank
## 1: 1 var1 23 2
## 2: 2 var1 1 3
## 3: 3 var1 4 2
## 4: 4 var1 100 1
## 5: 1 var2 8 3
## 6: 2 var2 2 2
## 7: 3 var2 5 1
## 8: 4 var2 80 2
## 9: 1 var3 30 1
## 10: 2 var3 3 1
## 11: 3 var3 1 3
## 12: 4 var3 60 3
# reshape to wide (if you want)
oldname <- paste0('var1',1:3)
twide <- reshape(tlong, direction = 'wide', timevar = 'time', idvar = 'id')
# reorder from value.var1, rank.var1,... to value.var1, value.var2,....rank.var1, rank.var2
setcolorder(twide, c('id', paste('value', oldname, sep ='.'), paste('rank', oldname, sep = '.'))
Upvotes: 3
Reputation: 110072
Here's one approach:
data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))
## > data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))
## id var1 var2 var3 var1.1 var2.1 var3.1
## 1 1 23 8 30 2 3 1
## 2 2 1 2 3 3 2 1
## 3 3 4 5 1 2 1 3
## 4 4 100 80 60 1 2 3
Upvotes: 2