roody
roody

Reputation: 2663

Using `rank` across columns to create new variable

I have a question I can't figure out, which I'm almost certain involves rank. Let's say that I have a df in wide form with 3 variables with integer values.

id   var1   var2  var3
1    23     8     30
2    1      2     3
3    4      5     1
4    100    80    60

I'd like to create three new variables with the rank of the values for var1, var2, and var3 from largest to smallest. For example,

id   var1   var2  var3   var1_rank   var2_rank   var3_rank
1    23     8     30      2          3            1
2    1      2     3       3          2            1 
3    4      5     1       2          1            3
4    100    80    60      1          2            3       

How would I go about doing this? Thanks!

Upvotes: 1

Views: 2905

Answers (3)

thelatemail
thelatemail

Reputation: 93938

Get the example data:

test <- read.table(text="id   var1   var2  var3
1    23     8     30
2    1      2     3
3    4      5     1
4    100    80    60",header=TRUE)

Get the ranks part and rename appropriately (notice the -x to reverse the rank so it relates to decreasing instead of increasing size - this will be generalisable to any size of data.frame used as input):

ranks <- t(apply(test[,-1], 1, function(x) rank(-x) ))
colnames(ranks) <- paste(colnames(ranks), "_rank", sep="")

Join with the old data frame.

data.frame(test, ranks)

Result:

> data.frame(test,ranks)
  id var1 var2 var3 var1_rank var2_rank var3_rank
1  1   23    8   30         2         3         1
2  2    1    2    3         3         2         1
3  3    4    5    1         2         1         3
4  4  100   80   60         1         2         3

To get to @mnel's answer using base R, you could also do something like:

testres <- data.frame(test["id"],stack(test[2:4]))
testres$rank <- ave(testres$values,testres$id,FUN=function(x) rank(-x) )

> testres
   id values  ind rank
1   1     23 var1    2
2   2      1 var1    3
3   3      4 var1    2
4   4    100 var1    1
5   1      8 var2    3
6   2      2 var2    2
7   3      5 var2    1
8   4     80 var2    2
9   1     30 var3    1
10  2      3 var3    1
11  3      1 var3    3
12  4     60 var3    3

Upvotes: 3

mnel
mnel

Reputation: 115515

I think it is easier to work in long format (and more memory efficient, as apply will coerce to a matrix. Here is an approach using reshape and data.table

library(data.table)
tlong <- reshape(data.table(test), direction ='long', varying = list(2:4), 
                 times = paste0('var',1:3), v.names = 'value')

# calculate the rank within each `id`

tlong[, rank := rank(-value), by = id]
tlong

##     id time value rank
##  1:  1 var1    23    2
##  2:  2 var1     1    3
##  3:  3 var1     4    2
##  4:  4 var1   100    1
##  5:  1 var2     8    3
##  6:  2 var2     2    2
##  7:  3 var2     5    1
##  8:  4 var2    80    2
##  9:  1 var3    30    1
## 10:  2 var3     3    1
## 11:  3 var3     1    3
## 12:  4 var3    60    3

# reshape to wide (if you want)

oldname <- paste0('var1',1:3)

twide <- reshape(tlong, direction = 'wide', timevar = 'time', idvar = 'id')
# reorder from value.var1, rank.var1,... to value.var1, value.var2,....rank.var1, rank.var2

setcolorder(twide, c('id', paste('value', oldname, sep ='.'), paste('rank', oldname, sep = '.'))

Upvotes: 3

Tyler Rinker
Tyler Rinker

Reputation: 110072

Here's one approach:

data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))

## > data.frame(dat, 4 - t(apply(dat[, -1], 1, rank)))
##   id var1 var2 var3 var1.1 var2.1 var3.1
## 1  1   23    8   30      2      3      1
## 2  2    1    2    3      3      2      1
## 3  3    4    5    1      2      1      3
## 4  4  100   80   60      1      2      3

Upvotes: 2

Related Questions