Reputation: 109
I'm trying to make the following call to outer()
substantially faster. Parallelizing via foreach
is still prohibitively slow, so I'd like to attempt calling this in C++ using Rcpp but would love to hear any faster alternative.
Given a matrix mat
and a list of matrix colnames col.list
I am summarizing mat
as such.
mycall <- function(mat, col.list) {
outer(
rownames(mat),
col.list,
Vectorize(function(x,y) {
mean(mat[x,y])
})
)
}
For instance:
set.seed(123)
mat <- matrix(rnorm(100),nrow=10)
rownames(mat) <- letters[1:10]
colnames(mat) <- LETTERS[1:10]
mat
A B C D E F G H I J
a -0.56047565 1.2240818 -1.0678237 0.42646422 -0.69470698 0.25331851 0.37963948 -0.4910312 0.005764186 0.9935039
b -0.23017749 0.3598138 -0.2179749 -0.29507148 -0.20791728 -0.02854676 -0.50232345 -2.3091689 0.385280401 0.5483970
c 1.55870831 0.4007715 -1.0260044 0.89512566 -1.26539635 -0.04287046 -0.33320738 1.0057385 -0.370660032 0.2387317
d 0.07050839 0.1106827 -0.7288912 0.87813349 2.16895597 1.36860228 -1.01857538 -0.7092008 0.644376549 -0.6279061
e 0.12928774 -0.5558411 -0.6250393 0.82158108 1.20796200 -0.22577099 -1.07179123 -0.6880086 -0.220486562 1.3606524
f 1.71506499 1.7869131 -1.6866933 0.68864025 -1.12310858 1.51647060 0.30352864 1.0255714 0.331781964 -0.6002596
g 0.46091621 0.4978505 0.8377870 0.55391765 -0.40288484 -1.54875280 0.44820978 -0.2847730 1.096839013 2.1873330
h -1.26506123 -1.9666172 0.1533731 -0.06191171 -0.46665535 0.58461375 0.05300423 -1.2207177 0.435181491 1.5326106
i -0.68685285 0.7013559 -1.1381369 -0.30596266 0.77996512 0.12385424 0.92226747 0.1813035 -0.325931586 -0.2357004
j -0.44566197 -0.4727914 1.2538149 -0.38047100 -0.08336907 0.21594157 2.05008469 -0.1388914 1.148807618 -1.0264209
col.list <- replicate(5, sample(colnames(mat),sample(10,1)), simplify = F)
col.list
[[1]]
[1] "I" "H" "F" "C"
[[2]]
[1] "H" "C" "E" "D"
[[3]]
[1] "F" "A" "B" "C"
[[4]]
[1] "I" "G" "H" "F"
[[5]]
[1] "B" "F" "A" "D" "J"
mycall(mat, col.list)
[,1] [,2] [,3] [,4] [,5]
[1,] -0.32494304 -0.45677441 -0.03772476 0.03692275 0.46737855
[2,] -0.54260254 -0.75753314 -0.02922133 -0.61368967 0.07088301
[3,] -0.10844910 -0.09763415 0.22265121 0.06475016 0.61009334
[4,] 0.14372171 0.40224937 0.20522554 0.07130067 0.36000416
[5,] -0.43982636 0.17912380 -0.31934091 -0.55151435 0.30598183
[6,] 0.29678266 -0.27389757 0.83293885 0.79433814 1.02136588
[7,] 0.02527506 0.17601171 0.06195023 -0.07211925 0.43025291
[8,] -0.01188734 -0.39897791 -0.62342288 -0.03697956 -0.23527315
[9,] -0.28972770 -0.12070775 -0.24994491 0.22537340 -0.08066115
[10,] 0.61991819 0.16277087 0.13782578 0.81898563 -0.42188074
Upvotes: 1
Views: 139
Reputation:
You could try:
sapply(col.list, function(v) rowMeans(mat[, v]))
I suspect the reason your solution is slow is Vectorize
: it's a nice way to transform a scalar function into a vectorized function, but it has a huge cost: since it's based on mapply
, it will call the function on each element, one by one. That is, one call to mean
for each entry. If the outer
result is large, that's going to be very costly. Instead, with the solution above, the code is at least vectorized in one direction, thanks to rowMeans
.
Upvotes: 1