Charlotte Deng
Charlotte Deng

Reputation: 131

How to find the location (row/column) of the minimum/maximum value of a data frame or a matrix (R question)

I want to find the location of the minimum or maximum value of a data frame or a matrix.

For example, let me use the example of a matrix of minimum (and let's not consider the presence of the same values, for now):

B<-matrix(c(1.5,2,3,4,5,5,4,3,2,1,2,4,6,8,10),nrow=3,ncol=5)
B
     [,1] [,2] [,3] [,4] [,5]
[1,]   1.5    4    4    1    6
[2,]    2    5    3    2    8
[3,]    3    5    2    4   10

What I want the output is:

row.number = 1

column.number = 4

I tried which.min or which.max. It only returns the "total" location as if the input is a vector (it will return the single number 4)

Thanks in advance!

Upvotes: 0

Views: 144

Answers (1)

r2evans
r2evans

Reputation: 160407

While which.min and friends does not support this directly, which(..., arr.ind=TRUE) does:

which(B == min(B), arr.ind=TRUE)
#      row col
# [1,]   1   4

Very important side note: there are two notes when doing this:

  1. This does not report the existence of ties; and

  2. This assumes that equality of floating-point will work, which is prone to Why are these numbers not equal? and R FAQ 7.31. So while this probably works most of the time, it is feasible that it will not always work. In the case when it doesn't work, it will return a 0-row matrix. One mitigating step would be to introduce a tolerance, such as

    which(abs(B - min(B)) < 1e-9, arr.ind=TRUE)
    #      row col
    # [1,]   1   4
    

    where 1e-9 is deliberately small, but "small" is relative to the range of expected values in the matrix.

Faster Alternative

Honestly, which.max gives you enough information, given you know the dimensions of the matrix.

m <- which.min(B)
c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
# [1] 1 4

For background, a matrix in R is just a vector, ordered in columns.

matrix(1:15, nrow=3)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    4    7   10   13
# [2,]    2    5    8   11   14
# [3,]    3    6    9   12   15

So we can use the modulus %% and integer-division (floor) %/% to determine to row and column number, respectively:

(1:15-1) %% 3 + 1
#  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
(1:15-1) %/% 3 + 1
#  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

And it turns out that this last method is much faster (not too surprising, considering the hard part is done in C):

microbenchmark::microbenchmark(
  a = which(B == min(B), arr.ind=TRUE),             # first answer, imperfect
  b = which(abs(B - min(B)) < 1e-9, arr.ind=TRUE),  # second, technically more correct
  c = {                                             # third, still correct, faster
    m <- which.min(B)
    c( (m-1) %% nrow(B) + 1, (m-1) %/% nrow(B) + 1 )
  }, times=10000)
# Unit: microseconds
#  expr min  lq     mean median   uq   max neval
#     a 8.4 9.0 10.27770    9.5 10.4  93.5 10000
#     b 9.0 9.6 10.94061   10.3 11.1 158.4 10000
#     c 3.3 4.0  4.48250    4.2  4.7  38.7 10000

Upvotes: 3

Related Questions