Reputation: 25
Suppose that named elements of a vector - stored in list - should be assigned to the matching columns of a matrix (see example below).
library(microbenchmark)
set.seed(123)
myList <- list()
for(i in 1:10000) {
myList[[i]] <- list(sample(setNames(rnorm(5), sample(LETTERS[1:5])), ceiling(runif(1,1,4))))
}
myMatrix <- matrix(NA, ncol = 5, nrow = 10000)
colnames(myMatrix) <- LETTERS[1:5]
for(i in 1:10000) {
myMatrix[i, match(names(myList[[i]][[1]]), colnames(myMatrix))] <- myList[[i]][[1]]
}
myList[[6]][[1]]
myMatrix[6,]
microbenchmark(for(i in 1:10000) {myMatrix[i, match(names(myList[[i]][[1]]), colnames(myMatrix))] <- myList[[i]][[1]]}, times = 10)
In this example, elements of 10,000 vectors are assigned to the matching columns of a matrix.
Problem
The assignment is slow (approximately 3.5 seconds)!
Question
How can I speed up this process in R or with Rcpp?
Upvotes: 0
Views: 178
Reputation: 132746
Use rbindlist
from package data.table. It can bind by matching column names.
library(microbenchmark)
n <- 10000
set.seed(123)
myList <- list()
for(i in 1:n) {
myList[[i]] <- list(sample(setNames(rnorm(5), sample(LETTERS[1:5])), ceiling(runif(1,1,4))))
}
myMatrix <- matrix(NA, ncol = 5, nrow = n)
colnames(myMatrix) <- LETTERS[1:5]
library(data.table)
microbenchmark(match = for(i in 1:n) {myMatrix[i, match(names(myList[[i]][[1]]), colnames(myMatrix))] <- myList[[i]][[1]]},
rbindlist = {
myMatrix1 <- as.matrix(rbindlist(lapply(myList,
function(x) as.list(unlist(x))),
fill = TRUE))
myMatrix1 <- myMatrix1[, order(colnames(myMatrix1))]
},
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# match 1392.52949 1496.40382 1599.63584 1605.39080 1690.98410 1761.67322 10 b
#rbindlist 48.76146 50.29176 51.66355 51.10672 53.75465 54.93798 10 a
all.equal(myMatrix, myMatrix1)
#TRUE
Upvotes: 2