Reputation: 1204
I have a matrix with a couple million rows and about 40 columns.
I want to sort the elements in each row in decreasing order. Thus, the element with the highest value of each row should be in the first column.
To do this I can use the apply
function:
set.seed(1)
mm <- replicate(10, rnorm(20)) #random matrix with 20 rows and 10 columns
mm.sorted <- apply(mm,1,sort,decreasing=T)
But for a very large matrix this approach takes a very long time.
Are there different approaches to speed up the sorting of elements in rows?
Upvotes: 4
Views: 1574
Reputation: 9666
Here is one clever way:
res <- matrix(mm[order(row(mm), -mm)], nrow = nrow(mm), byrow = TRUE)
Also faster than others:
system.time(
res <- matrix(mm[order(row(mm), -mm)], nrow=nrow(mm), byrow=TRUE)
)
user system elapsed
1.910 0.254 2.170
Upvotes: 1
Reputation: 132989
You could use package data.table:
set.seed(1)
mm <- matrix(rnorm(1000000*40,0,10),ncol=40)
library(data.table)
system.time({
d <- as.data.table(mm)
d[, row := .I]
d <- melt(d, id.vars = "row") #wide to long format
setkey(d, row, value) #sort
d[, variable := paste0("V", ncol(mm):1)] #decreasing order
#back to wide format and coerce to matrix
msorted <- as.matrix(dcast(d, row ~ variable)[, row := NULL])
})
#user system elapsed
#4.96 0.59 5.62
If you could keep it as a long-format data.table (i.e., skip the last step), it would take about 2 seconds on my machine.
For comparison, timings of @qjgods' answer on my machine:
#user system elapsed
#3.71 2.08 8.81
Note that using apply
(or parallel versions of it) transposes the matrix.
Upvotes: 6
Reputation: 1000
use the parallel package to speed up
library(parallel)
data<-matrix(rnorm(1000000*40,0,10),ncol=40)
cl <- makeCluster(8) # 8 is the number of CPU
system.time({
parApply(cl,data,1,sort,decreasing=T)
})
user system elapsed
9.68 10.11 29.87
stopCluster(cl)
Upvotes: 7