Reputation: 7004
I am trying to get the rank of an observation in a matrix, taking into account NaN's and values that can repeat themselfs.
E.g. if we have
A = [0.1 0.15 0.3; 0.5 0.15 0.1; NaN 0.2 0.4];
A =
0.1000 0.1500 0.3000
0.5000 0.1500 0.1000
NaN 0.2000 0.4000
Then I want to get the following output:
B =
1 2 4
6 2 1
NaN 3 5
Thus 0.1 is the lowest value (rank=1), whereas 0.5 is the highest value (rank = 6).
Ideally an efficient solution without loops.
Upvotes: 1
Views: 54
Reputation: 30047
You can use unique
. This sorts data by default, and you can get the index of the sorted unique values. This would replicate your tie behaviour, since identical values will have the same index. You can omit NaN
values with logical indexing.
r = A; % or NaN(size(A))
nanIdx = isnan(A); % Get indices of NaNs in A to ignore
[~, ~, r(~nanIdx)] = unique(A(~nanIdx)) % Assign non-NaN values to their 'unique' index
>> r =
[ 1 2 4
6 2 1
NaN 3 5 ]
If you have the stats toolbox you can use tiedrank
function for a similar result.
r = reshape(tiedrank(A(:)), size(A)) % Have to use reshape or rank will be per-column
>> r =
[ 1.5, 3.5, 6.0
8.0, 3.5, 1.5
NaN, 5.0, 7.0 ]
This is not your desired result (as per your example). You can see that tiedrank
actually uses a more conventional ranking system than yours, where a tie gives each result the average rank. For example a tied 1st and 2nd gives each rank 1.5, and the next rank is 3.
Upvotes: 3