jalapic
jalapic

Reputation: 14192

Ranking NAs in a vector equally [r]

I'm wondering if I'm missing something trivial here:

When ranking a vector like this containing NAs, there are four options of how to deal with the NAs:

x<-c(5, NA, 3, NA, 6, 9, 10, NA, 5, 7, 12)

rank(x, na.last=T)   
# [1]  2.5  9.0  1.0 10.0  4.0  6.0  7.0 11.0  2.5  5.0  8.0

rank(x, na.last=F)
# [1]  5.5  1.0  4.0  2.0  7.0  9.0 10.0  3.0  5.5  8.0 11.0

rank(x, na.last=NA)
# [1] 2.5 1.0 4.0 6.0 7.0 2.5 5.0 8.0

rank(x, na.last="keep")
#  [1] 2.5  NA 1.0  NA 4.0 6.0 7.0  NA 2.5 5.0 8.0

I am looking to keep and rank the NAs. For my purposes they should be ranked equally and last. In this situation the ties.method to be used is ok to be the default "average". I'm looking for this result:

#  [1] 2.5  10.0 1.0  10.0 4.0 6.0 7.0  10.0 2.5 5.0 8.0

From the ?rank help: "NA values are never considered to be equal: for na.last = TRUE and na.last = FALSE they are given distinct ranks in the order in which they occur in x."

So, it looks like what I want - i.e. to treat them equally and average their rank as a last rank is not possible through using rank. Is this true - is there no simple way of getting this done via rank? Do I have to rely on a second line of code to re-insert the rank of the NAs after doing rank(x, na.last="keep") ?

Upvotes: 5

Views: 341

Answers (2)

josliber
josliber

Reputation: 44310

You could rank it both forward and backwards and then take the mean:

(rank(x, na.last=T) + rev(rank(rev(x), na.last=T))) / 2
# [1]  2.5 10.0  1.0 10.0  4.0  6.0  7.0 10.0  2.5  5.0  8.0

Upvotes: 2

thelatemail
thelatemail

Reputation: 93813

I'm not sure if this is the most elegant solution, but you could replace the NA values so that they are always last, like so:

rank( replace(x, is.na(x), max(x,na.rm=TRUE) + 1) )
#[1]  2.5 10.0  1.0 10.0  4.0  6.0  7.0 10.0  2.5  5.0  8.0

Upvotes: 3

Related Questions