Reputation: 131
I want to find the location of the minimum or maximum value of a data frame or a matrix.
For example, let me use the example of a matrix of minimum (and let's not consider the presence of the same values, for now):
B<-matrix(c(1.5,2,3,4,5,5,4,3,2,1,2,4,6,8,10),nrow=3,ncol=5)
B
[,1] [,2] [,3] [,4] [,5]
[1,] 1.5 4 4 1 6
[2,] 2 5 3 2 8
[3,] 3 5 2 4 10
What I want the output is:
row.number = 1
column.number = 4
I tried which.min or which.max. It only returns the "total" location as if the input is a vector (it will return the single number 4)
Thanks in advance!
Upvotes: 0
Views: 144
Reputation: 160407
While which.min
and friends does not support this directly, which(..., arr.ind=TRUE)
does:
which(B == min(B), arr.ind=TRUE)
# row col
# [1,] 1 4
Very important side note: there are two notes when doing this:
This does not report the existence of ties; and
This assumes that equality of floating-point will work, which is prone to Why are these numbers not equal? and R FAQ 7.31. So while this probably works most of the time, it is feasible that it will not always work. In the case when it doesn't work, it will return a 0-row matrix
. One mitigating step would be to introduce a tolerance, such as
which(abs(B - min(B)) < 1e-9, arr.ind=TRUE)
# row col
# [1,] 1 4
where 1e-9
is deliberately small, but "small" is relative to the range of expected values in the matrix.
Honestly, which.max
gives you enough information, given you know the dimensions of the matrix.
m <- which.min(B)
c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
# [1] 1 4
For background, a matrix
in R is just a vector, ordered in columns.
matrix(1:15, nrow=3)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 7 10 13
# [2,] 2 5 8 11 14
# [3,] 3 6 9 12 15
So we can use the modulus %%
and integer-division (floor) %/%
to determine to row and column number, respectively:
(1:15-1) %% 3 + 1
# [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
(1:15-1) %/% 3 + 1
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
And it turns out that this last method is much faster (not too surprising, considering the hard part is done in C):
microbenchmark::microbenchmark(
a = which(B == min(B), arr.ind=TRUE), # first answer, imperfect
b = which(abs(B - min(B)) < 1e-9, arr.ind=TRUE), # second, technically more correct
c = { # third, still correct, faster
m <- which.min(B)
c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
}, times=10000)
# Unit: microseconds
# expr min lq mean median uq max neval
# a 8.4 9.0 10.27770 9.5 10.4 93.5 10000
# b 9.0 9.6 10.94061 10.3 11.1 158.4 10000
# c 3.3 4.0 4.48250 4.2 4.7 38.7 10000
Upvotes: 3