Reputation: 147
I have a theoretical and coding question that has to do with densities and simulating values.
I am building custom densities via the density(x) command. However I am hoping to generate 1000-10000 simulated values from this density. The overall goal is to take two densities build by density(x$y) form and run simulations and say this density A is more than density B x% of the time. I would just take each simulated value and see which is higher and code to count how many times A is higher than B.
Is there a way to accomplish this? Or is there some way to accomplish something similar with these densities? Thanks!
Upvotes: 1
Views: 1025
Reputation: 12819
Following @DWin's tip to invert the ecdf
, here is how to implement such an approach, using a spline to fit the inverted step-function:
Given
z <- c(rnorm(40), runif(40))
plot(density(z))
Define
spl <- with(environment(ecdf(z)), splinefun(y, x))
sampler <- function(n)spl(runif(n))
Now you can call sampler()
with the size you want:
plot(density(sampler(1000)))
Final note: This will never generate values outside the range of the original data, but duplicates will be extremely rare:
> anyDuplicated(sampler(1e4))
[1] 0
Upvotes: 1
Reputation: 263301
The sample
function can take the midpoints of the intervals of the sample density and then use the densities as the prob-arguments.
mysamp <- sample(x= dens$x, size=1000 , prob=dens$y, repl=TRUE)
This has the disadvantage that you may need to jitter the result to avoid lots of duplicates.
mysamp <- jitter(mysamp)
Another method is to use approxfun
and ecdf
. You may need to invert the function (reverse role of x and y) in order to sample using the input of runif(1000)
into the result. I'm pretty sure there are worked examples of this in SO and I'm pretty sure that I am one of many who in the past have posted such code to R-help. (If your searches have failed to find then then post the search strategies and others can try to improve upon them.)
Upvotes: 1