Reputation: 1013
I have a matrix "multiOrderPairsFlat" of 2m+ rows and 2 columns where each cell contains a SKU description (e.g. "Pipe2mSteel" or "Bushing1inS") and would like to sort every row alphabetically, so that in every row, e.g. "Bushings1inS" is in the first column and "Pipe2mSteel" in the second.
However, if I run:
for (i in 1:length(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
It takes forever and I doubt this is the quickest way of dealing with this problem. Do you have any advice on how to solve this more efficiently, e.g. by vectorizing the operation?
Thanks for helping out;) Best seulberg1
Upvotes: 0
Views: 283
Reputation: 887008
It may be better to use pmin/pmax
after converting to data.frame
(as there are only two columns)
system.time({
df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))
})
# user system elapsed
# 0.49 0.02 0.50
system.time({
for (i in 1:nrow(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
})
# user system elapsed
# 11.99 0.00 12.00
all.equal(as.matrix(res), multiOrderPairsFlat, check.attributes=FALSE)
#[1] TRUE
Checking the memory allocation
library(profvis)
profvis({
df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))
})
#3.3 MB
profvis({
for (i in 1:nrow(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
})
#12.8 MB
set.seed(24)
multiOrderPairsFlat <- cbind(sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE),
sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE))
Upvotes: 2