NJmonkey
NJmonkey

Reputation: 21

How to calculate for each element in a vector the fraction of elements in another vector that is smaller?

n<-100000   
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))

It takes too long to run this small code. Simply put, I have two vectors aa and bb. For each element of aa, say aa[i], I want the proportion of bb < aa[i]

I found this article and tried to use it to speed up. But it does not work. Speed comparison of sapply with a composite function

Any help will be appreciated!

Upvotes: 2

Views: 713

Answers (3)

IRTFM
IRTFM

Reputation: 263362

If you only want the proportion ' < aa[i]' then you should just determine the number of bb less than than each value of aa and then divide by length:

bbs <- sort(bb)
zz <- findInterval(aa, bbs)
zz <- zz/length(aa)

It does what you say you want, while your code I fear does not.

Upvotes: 1

Andy
Andy

Reputation: 4866

You may be able to use the findInterval function:

n <- 25000
aa <- rnorm(n)
bb <- rnorm(n)
system.time(q1 <- lapply(aa, function(z){mean(bb<pnorm(z))}))
#   user  system elapsed
# 20.057   2.544  22.807
system.time(q2 <- findInterval(pnorm(aa), sort(bb))/n)
#   user  system elapsed
#  0.020   0.000   0.021
all.equal(as.vector(q1, "numeric"), q2)
# [1] TRUE

Note that findInterval returns indices, so I've divided the result by n. If you can sort pnorm(aa) before giving it to findInterval, it will be even faster.

Upvotes: 7

Brandon Bertelsen
Brandon Bertelsen

Reputation: 44648

I'm not meaning to be facetious but these are the types of problems that R is designed to solve without having to do every single calculation - ie, use statistics!

Assuming that the distributions are normal...

aa.new <- sample(aa, 1000)
bb.new <- sample(bb, 1000)

x <- lapply(aa.new, function(z){mean(bb.new<pnorm(z))})
x <- unlist(x)

mean(x)

You can be 99% certain that the proportion of bb < aa[i] falls between +/- 4% of mean(x).

For simple random sampling, 99% margin of error = 1.29/sqrt(n)

Upvotes: 1

Related Questions