Reputation: 14192
I'm wondering if I'm missing something trivial here:
When ranking a vector like this containing NAs, there are four options of how to deal with the NAs:
x<-c(5, NA, 3, NA, 6, 9, 10, NA, 5, 7, 12)
rank(x, na.last=T)
# [1] 2.5 9.0 1.0 10.0 4.0 6.0 7.0 11.0 2.5 5.0 8.0
rank(x, na.last=F)
# [1] 5.5 1.0 4.0 2.0 7.0 9.0 10.0 3.0 5.5 8.0 11.0
rank(x, na.last=NA)
# [1] 2.5 1.0 4.0 6.0 7.0 2.5 5.0 8.0
rank(x, na.last="keep")
# [1] 2.5 NA 1.0 NA 4.0 6.0 7.0 NA 2.5 5.0 8.0
I am looking to keep and rank the NAs. For my purposes they should be ranked equally and last. In this situation the ties.method
to be used is ok to be the default "average". I'm looking for this result:
# [1] 2.5 10.0 1.0 10.0 4.0 6.0 7.0 10.0 2.5 5.0 8.0
From the ?rank help: "NA values are never considered to be equal: for na.last = TRUE and na.last = FALSE they are given distinct ranks in the order in which they occur in x."
So, it looks like what I want - i.e. to treat them equally and average their rank as a last rank is not possible through using rank
. Is this true - is there no simple way of getting this done via rank? Do I have to rely on a second line of code to re-insert the rank of the NAs after doing rank(x, na.last="keep")
?
Upvotes: 5
Views: 341
Reputation: 44310
You could rank it both forward and backwards and then take the mean:
(rank(x, na.last=T) + rev(rank(rev(x), na.last=T))) / 2
# [1] 2.5 10.0 1.0 10.0 4.0 6.0 7.0 10.0 2.5 5.0 8.0
Upvotes: 2
Reputation: 93813
I'm not sure if this is the most elegant solution, but you could replace the NA values so that they are always last, like so:
rank( replace(x, is.na(x), max(x,na.rm=TRUE) + 1) )
#[1] 2.5 10.0 1.0 10.0 4.0 6.0 7.0 10.0 2.5 5.0 8.0
Upvotes: 3