Reputation: 123
I have a function with two arguments. The first argument takes vector, and the second argument takes a scalar. I want to apply this function to each row of a matrix, but this function takes different second argument every time. I tried the following, it didn't work. I expected to calculate the p.value for each row and then divide the p.value by the row number. I expected the result to be a vector, but I got a matrix instead. This is a pseudo example, but it illustrates my purpose.
> foo = matrix(rnorm(100),ncol=20)
> f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
> goo = 1:5
> apply(foo,1,f,y=goo)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9406881 0.6134117 0.5484542 0.11299535 0.20420786
[2,] 0.4703440 0.3067059 0.2742271 0.05649767 0.10210393
[3,] 0.3135627 0.2044706 0.1828181 0.03766512 0.06806929
[4,] 0.2351720 0.1533529 0.1371135 0.02824884 0.05105196
[5,] 0.1881376 0.1226823 0.1096908 0.02259907 0.04084157
The following for loop strategy produces the expected result, expect would be very slow for the real data.
> res = numeric(5)
> for (i in 1:5){
res[i]=f(foo[i,],i)
}
> res
[1] 0.94068810 0.30670585 0.18281807 0.02824884 0.04084157
Any suggestions would be appreciated!
Upvotes: 1
Views: 1258
Reputation: 226057
f <- function (x,y) t.test(x[1:10],x[11:20])$p.value/y
f2 <- function(a, b){
tt <- t.test(x = a[1:10], y = a[11:20])$p.value
tt/b
}
f3 <- function() {
res <- numeric(5)
for (i in 1:5){
res[i] <- f(foo[i,],i)
}
res
}
f4 <- function(x) t.test(x[1:10], x[11:20])$p.value
set.seed(101)
foo <- matrix(rnorm(100),ncol=20)
goo <- 1:5
library(rbenchmark)
benchmark(
apply(foo, 1, f4) / goo,
mapply(f,split(foo,row(foo)),goo),
f2(foo,goo),
f3(),replications=1000,
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i])),
columns=c("test","replications","elapsed","relative"))
## test replications elapsed relative
## 1 apply(foo, 1, f4)/goo 1000 1.581 5.528
## 3 f2(foo, goo) 1000 0.286 1.000
## 4 f3() 1000 1.458 5.098
## 2 mapply(...) 1000 1.599 5.591
## 5 sapply(...) 1000 1.486 5.196
The direct division is best (but not actually applicable); for this example there's not much difference between the other solutions, but for
loop is better than sapply
which is better than mapply
. You should try this on a more realistic example to see how it's going to scale for your problem.
Upvotes: 1
Reputation: 42629
If your real purpose is like your example, you can vectorize the division:
f <- function(x) t.test(x[1:10], x[11:20])$p.value
apply(foo, 1, f) / goo
Based on the comment, the above is not appropriate.
In the case of the example, you might observe that the diagonal of the returned matrix is the desired result:
f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
goo = 1:5
diag(apply(foo,1,f,y=goo))
Besides being inefficient in time or space, this suffers from another problem. It is a result of the operation on y
being vectorized that this is correct for the example. And in that case, the former solution is better. So I suspect that in your actual problem, your operation is not vectorized.
Sometimes a for
loop really is the best answer. The apply
family of functions are not magical; they are still loops.
Here is an sapply
solution. It won't beat for
for time (probably won't lose either) but it doesn't have a high space overhead. The idea is to apply the row index and use that to extract the row of foo
and the element of goo
to pass to f
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i]))
Upvotes: 2