seulberg1
seulberg1

Reputation: 1013

R efficient way to sort a Matrix by row

I have a matrix "multiOrderPairsFlat" of 2m+ rows and 2 columns where each cell contains a SKU description (e.g. "Pipe2mSteel" or "Bushing1inS") and would like to sort every row alphabetically, so that in every row, e.g. "Bushings1inS" is in the first column and "Pipe2mSteel" in the second.

However, if I run:

for (i in 1:length(multiOrderPairsFlat)){
  multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}

It takes forever and I doubt this is the quickest way of dealing with this problem. Do you have any advice on how to solve this more efficiently, e.g. by vectorizing the operation?

Thanks for helping out;) Best seulberg1

Upvotes: 0

Views: 283

Answers (1)

akrun
akrun

Reputation: 887008

It may be better to use pmin/pmax after converting to data.frame (as there are only two columns)

 system.time({
 df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
  res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

 })
 #    user  system elapsed 
 #  0.49    0.02    0.50 

system.time({
  for (i in 1:nrow(multiOrderPairsFlat)){
    multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
  }
 })

#  user  system elapsed 
#  11.99    0.00   12.00 

all.equal(as.matrix(res), multiOrderPairsFlat, check.attributes=FALSE)
#[1] TRUE

Checking the memory allocation

library(profvis)

profvis({
 df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
 res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

  })

#3.3 MB
profvis({
 for (i in 1:nrow(multiOrderPairsFlat)){
  multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
  }
})

#12.8 MB

data

set.seed(24)
multiOrderPairsFlat <- cbind(sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE),
    sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE))

Upvotes: 2

Related Questions