Reputation: 21
n<-100000
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))
It takes too long to run this small code. Simply put, I have two vectors aa
and bb
. For each element of aa
, say aa[i]
, I want the proportion of bb < aa[i]
I found this article and tried to use it to speed up. But it does not work. Speed comparison of sapply with a composite function
Any help will be appreciated!
Upvotes: 2
Views: 713
Reputation: 263362
If you only want the proportion ' < aa[i]' then you should just determine the number of bb less than than each value of aa and then divide by length:
bbs <- sort(bb)
zz <- findInterval(aa, bbs)
zz <- zz/length(aa)
It does what you say you want, while your code I fear does not.
Upvotes: 1
Reputation: 4866
You may be able to use the findInterval
function:
n <- 25000
aa <- rnorm(n)
bb <- rnorm(n)
system.time(q1 <- lapply(aa, function(z){mean(bb<pnorm(z))}))
# user system elapsed
# 20.057 2.544 22.807
system.time(q2 <- findInterval(pnorm(aa), sort(bb))/n)
# user system elapsed
# 0.020 0.000 0.021
all.equal(as.vector(q1, "numeric"), q2)
# [1] TRUE
Note that findInterval
returns indices, so I've divided the result by n
. If you can sort pnorm(aa)
before giving it to findInterval
, it will be even faster.
Upvotes: 7
Reputation: 44648
I'm not meaning to be facetious but these are the types of problems that R is designed to solve without having to do every single calculation - ie, use statistics!
Assuming that the distributions are normal...
aa.new <- sample(aa, 1000)
bb.new <- sample(bb, 1000)
x <- lapply(aa.new, function(z){mean(bb.new<pnorm(z))})
x <- unlist(x)
mean(x)
You can be 99% certain that the proportion of bb < aa[i] falls between +/- 4% of mean(x).
For simple random sampling, 99% margin of error = 1.29/sqrt(n)
Upvotes: 1