Brandon Bertelsen
Brandon Bertelsen

Reputation: 44648

How to get ranks with no gaps when there are ties among values?

When there are ties in the original data, is there a way to create a ranking without gaps in the ranks (consecutive, integer rank values)? Suppose:

x <-  c(10, 10, 10, 5, 5, 20, 20)
rank(x)
# [1] 4.0 4.0 4.0 1.5 1.5 6.5 6.5

In this case the desired result would be:

my_rank(x)
[1] 2 2 2 1 1 3 3

I've played with all the options for ties.method option (average, max, min, random), none of which are designed to provide the desired result.

Is it possible to acheive this with the rank() function?

Upvotes: 16

Views: 4700

Answers (8)

s_baldur
s_baldur

Reputation: 33488

If you don't mind leaving base R:

library(data.table)
frank(x, ties.method = "dense")
[1] 2 2 2 1 1 3 3

data:

x <- c(10, 10, 10, 5, 5, 20, 20)

Upvotes: 5

tmfmnk
tmfmnk

Reputation: 39858

For those fond of using dplyr:

dense_rank(x)

[1] 2 2 2 1 1 3 3

Upvotes: 3

Marek
Marek

Reputation: 50704

Modified crayola solution but using match instead of merge:

x_unique <- unique(x)
x_ranks <- rank(x_unique)
x_ranks[match(x,x_unique)]

edit

or in a one-liner, as per @hadley 's comment:

match(x, sort(unique(x)))

Upvotes: 16

BENY
BENY

Reputation: 323226

try to think about another way

x <-  c(10,10,10,5,5,20,20)
as.numeric(as.factor(x))
[1] 2 2 2 1 1 3 3

Upvotes: 4

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

The "loopless" way to do it is to simply treat the vector as an ordered factor, then convert it to numeric:

> as.numeric( ordered( c( 10,10,10,10, 5,5,5, 10, 10 ) ) )
[1] 2 2 2 2 1 1 1 2 2
> as.numeric( ordered( c(0.5,0.56,0.76,0.23,0.33,0.4) ))
[1] 4 5 6 1 2 3
> as.numeric( ordered( c(1,1,2,3,4,5,8,8) ))
[1] 1 1 2 3 4 5 6 6

Update: Another way, that seems faster is to use findInterval and sort(unique()):

> x <- c( 10, 10, 10, 10, 5,5,5, 10, 10)
> findInterval( x, sort(unique(x)))
[1] 2 2 2 2 1 1 1 2 2

> x <- round( abs( rnorm(1000000)*10))
> system.time( z <- as.numeric( ordered( x )))
   user  system elapsed 
  0.996   0.025   1.021 
> system.time( z <- findInterval( x, sort(unique(x))))
   user  system elapsed 
  0.077   0.003   0.080 

Upvotes: 9

crayola
crayola

Reputation: 1678

Another function that does this, but it seems inefficient. There is no for loop, but I doubt it is more efficient than Sacha's suggestion!

x=c(1,1,2,3,4,5,8,8)
fancy.rank <- function(x) {
    x.unique <- unique(x)
    d1 <- data.frame(x=x)
    d2 <- data.frame(x=x.unique, rank(x.unique))
    merge(d1, d2, by="x")[,2]
}

fancy.rank(x)

[1] 1 1 2 3 4 5 6 6

Upvotes: 2

Chase
Chase

Reputation: 69171

What about sort()?

x <- c(1,1,2,3,4,5)
sort(x)

> sort(x) 
[1] 1 1 2 3 4 5

Upvotes: -1

Sacha Epskamp
Sacha Epskamp

Reputation: 47551

I can think of a quick function to do this. It's not optimal with a for loop but it works:)

x=c(1,1,2,3,4,5,8,8)

foo <- function(x){
    su=sort(unique(x))
    for (i in 1:length(su)) x[x==su[i]] = i
    return(x)
}

foo(x)

[1] 1 1 2 3 4 5 6 6

Upvotes: 4

Related Questions