puslet88
puslet88

Reputation: 1390

R: Generate data from a probability density distribution

Say I have a simple array, with a corresponding probability distribution.

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it.

I did have success finding a simple solution on my own. Thanks!

Upvotes: 15

Views: 10456

Answers (3)

Anders Ellern Bilgrau
Anders Ellern Bilgrau

Reputation: 10253

From the examples in the documentation of ?density you (almost) get the answer.

So, something like this should do it:

library("stats")    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

# From the example.
N <- 1e6
x.new <- rnorm(N, sample(data, size = N, replace = TRUE), pdf_of_data$bw)

# Histogram of the draws with the distribution superimposed.
hist(x.new, freq = FALSE)
lines(pdf_of_data)

Imgur

You can just reject the draws outside your interval as in rejection sampling. Alternatively, you can use the algorithm described in the link.

Upvotes: 9

user295691
user295691

Reputation: 7248

Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input.

The compound expression looks like

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields

hist(random.points, 100)

enter image description here

Upvotes: 13

Neal Fultz
Neal Fultz

Reputation: 9696

To draw from the curve:

sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)

Upvotes: 4

Related Questions