Reputation: 792
I have a matrix filled with somewhat random elements. I need every row sorted in decreasing order, then a function is called on the matrix, and finally the resulting matrix needs to be unsorted to the original order.
This is quickly accomplished vector-wise as shown here, but what's the fastest way to do this to every row in a matrix? Right now I'm doing:
# Example matrix
m <- matrix(runif(100), nrow = 25, ncol = 4)
# Get the initial order by row
om <- t(apply(m, 1, order, decreasing = T))
sm <- m
for (i in seq_len(nrow(m))) {
sm[i, ] <- sm[i, om[i, ]]
}
# ** Operations performed on sm **
# Then unsort
for (i in seq_len(nrow(m))) {
sm[i, ] <- sm[i, order(om[i, ])]
}
# sm is now sorted by-row in the same order as m
Is there some way given om
in the above to sort and unsort while avoiding the for loop or an apply function (both of which make this operation very slow for big m). Thanks!
Edit: There are pointers here: Fastest way to sort each row of a large matrix in R The operation is done inside a function that is already called using parallel, so this operation must be done using serial code.
Upvotes: 0
Views: 575
Reputation: 72828
Row-wise sorting seems to be straightforward. To get the original order back (un-sort) we need the row-wise ranks rather than their order. Thereafter, what works for column sorting in @Josh O'Brien's answer we can adapt for rows.
Base R solution:
rr <- t(apply(m, 1, rank)) ## get initial RANKS by row
sm <- t(apply(m, 1, sort)) ## sort m
## DOING STUFF HERE ##
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))] ## un-sort
all(m == sm) ## check
# [1] TRUE
Seems to work.
In your linked answer, the rowSort
function of the Rfast
package stands out well in terms of performance, which may cover the sorting issue. Moreover there's also a rowRanks
function that will cover our ranking issue. So we can avoid apply
.
Let's try it out.
m[1:3, ]
# [,1] [,2] [,3] [,4]
# [1,] 0.9148060 0.5142118 0.3334272 0.719355838
# [2,] 0.9370754 0.3902035 0.3467482 0.007884739
# [3,] 0.2861395 0.9057381 0.3984854 0.375489965
library(Rfast)
rr <- rowRanks(m) ## get initial RANKS by row
sm <- rowSort(m) ## sort m
sm[1:3, ] # check
# [,1] [,2] [,3] [,4]
# [1,] 0.36106962 0.4112159 0.6262453 0.6311956
# [2,] 0.01405302 0.2171577 0.5459867 0.6836634
# [3,] 0.07196981 0.2165673 0.5739766 0.6737271
## DOING STUFF HERE ##
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))] ## un-sort
all(sm == m) ## check
# [1] TRUE
Dito.
m.test <- matrix(runif(4e6), ncol = 4)
dim(m.test)
# [1] 1000000 4
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Rfast 897.6286 910.91 956.6259 924.1914 986.1246 1048.058 3 a
# baseR 87931.2824 88004.73 95659.8671 88078.1737 99524.1594 110970.145 3 c
# forloop 58927.7784 59434.54 60317.3903 59941.2930 61012.1963 62083.100 3 b
Not so bad!!
Data/Code:
set.seed(42)
m <- matrix(runif(100), nrow = 25, ncol = 4)
## benchmark
m.test <- matrix(runif(4e6), ncol = 4)
microbenchmark::microbenchmark(
Rfast={
rr <- rowRanks(m.test)
sm <- rowSort(m.test)
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]},
baseR={
rr <- t(apply(m.test, 1, rank))
sm <- t(apply(m.test, 1, sort))
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]
},
forloop={
om <- t(apply(m.test, 1, order, decreasing = T))
sm <- m.test
for (i in seq_len(nrow(m.test))) {
sm[i, ] <- sm[i, om[i, ]]
}
for (i in seq_len(nrow(m.test))) {
sm[i, ] <- sm[i, order(om[i, ])]
}
}, times=3L
)
Upvotes: 3