Reputation: 21
I have a multivariate pdf P(x,y,z), and I need to randomly draw samples from it.
With a univariate pdf I would typically approximate the cdf with a spline, draw a random number between 0 and 1 and evaluate the spline:
from scipy import interpolate
import numpy
def P(x):
return x**2
xs=numpy.linspace(0,1,101)
Px=P(xs)
cdf=numpy.cumsum(Px),numpy.sum(Px)
cdfspline=interpolate.splrep(cdf,xs)
randomx=interpolate.splev(numpy.random.random(1),cdf)
Upvotes: 1
Views: 2349
Reputation: 1580
This is not really a Python question, but a statistical analysis question.
First, define your interval and normalize your PDF. The 3-dimensional integral of the PDF on the interval should be 1. In your example, you did this after the fact by dividing cdf
with sum(Px)
(I assume the comma in that line was meant to be a slash).
Look up rejection sampling (for example in Wikipedia). If you will use only rejection sampling, it isn't so critical to normalize PDF, but you'd better make sure the functional form doesn't exceed 1.
You can implement rejection sampling to give you a sample at a time by creating some number of trial x,y,z uniformly sampled: Uxyz = rand(3,N)
and returning the first Uxyz[:,n]
where 0.001*PDF(Uxyz[0,n], Uxyz[1,n], Uxyz[2,n]) > rand(1)
. The factor 0.001 is some small number (smaller factors get you fewer but more cleanly distributed samples).
With a vector tool like numpy
it is more efficient to generate a larger number of random samples, and return all the x,y,z that pass the rejection sampling. The price of using rejection sampling is you won't know in advance how many samples you get from your distribution.
Details of how you do this most efficiently depend on the actual functional form of your multivariate PDF and the x,y,z interval of interest. For example, if you can separate P(x,y,z)
into Px(x)*Py(y)*Pz(z)
or even into Pxy(x,y)*Pz(z)
, that will make your task easier. The classic example is when the PDF looks like exp(-a*x**2 - b*y**2 - c*z**2)
.
Upvotes: 1