Reputation: 2299
Consider a 5-variate cumulative distribution function (cdf) which I call F.
I want to sample random 5x1 vectors from this cdf in Matlab. F is not a cdf that has been already implemented in Matlab (such as normal, t-student, etc.). Specifically, it is defined as
I have read several question/answers in this and other forums on how to sample from customised probability distribution functions in Matlab. However,
1) Most of them are for univariate cdf, e.g. here. The idea is to apply inverse transform sampling. My problem is a bit more complicated because I would need to "invert"a 5-variate function.
2) Another option could be to use slicesample as suggested here but I don't know how to write the analytical expression of the probability density function in my case.
3) Here's another idea but specific for the bivariate case.
Could you kindly help me to understand how I can proceed?
Upvotes: 3
Views: 2716
Reputation: 60494
Your link under #3 gives a hint of the solution. It explains the bivariate case when you have a PDF. Here we'll extend this to any number of dimensions, for the case when you have a CDF.
So the process is:
Note that if you have a PDF, computing marginal distributions involves integrating over the remaining variables. So a marginal distribution for r1 requires integrating over r2..r5, a marginal distribution for r2 given r1 requires integrating over r3..r5, etc.
When you have a CDF, computing the marginal distributions is trivial, as it already integrates the PDF: the marginal distribution for r1 is F(x,∞,∞,∞,∞). However, obtaining the marginal distribution given one or more variables requires differentiating: a marginal distribution for r2 given r1 requires differentiating along r1, a marginal distribution for r3 given r1 and r2 requires differentiating along r1 and r2, etc.
It might be possible to obtain these derivatives analytically (this would be the more efficient solution). Here we use the finite difference derivative approximation instead (this makes it easier to plug in any CDF).
Let's see some MATLAB code:
sigma_a = 0.5;
sigma_b = 0.3;
F = @(r1,r2,r3,r4,r5)exp(-exp(-r1) - (exp(-r2/sigma_a)+exp(-r3/sigma_a)).^sigma_a ...
- (exp(-r4/sigma_b)+exp(-r5/sigma_b)).^sigma_b);
lims = [-5,10]; % This is the area along all dimensions containing 99.99% of the PDF
N = 1000;
values = zeros(N,5);
for n=1:N
values(n,:) = sample_random(F,5,lims);
end
Here I picked some random values for sigma_a
and sigma_b
, and used those to define a function F
of 5 variables r1
..r5
. I figured that the domain of the PDF is the same along all dimensions, I found a region slightly larger than really necessary (lims
). Next, I obtain 1000 random samples from the distribution F
by calling sample_random
:
function r = sample_random(F,N,lims)
delta = diff(lims)/10000;
x = linspace(lims(1),lims(2),300);
r = inf(1,N);
for ii = 1:N
marginal = get_marginal(F,r,ii,x,delta);
p = rand * marginal(end);
[~,I] = unique(marginal); % interp1 cannot handle duplicated points, let's remove them
r(ii) = interp1(marginal(I),x(I),p);
end
delta
is the distance we'll use for the finite difference approximation to the derivative. x
represents sample points along any one dimension of F
.
We first define r
as the vector [inf,inf,inf,inf,inf]
, we'll use this as the sample locations, and at the end of the function it will contain the random value drawn from our distribution.
Next we loop over the 5 dimensions, in each iteration we sample the marginal distribution for dimension ii
, given the values for the previous dimensions (which have already been picked). The function get_marginal
is below. We pick a random value in between 0 and the max of this marginal CDF (note that the max decreases as we pick values of r
for each dimension, when ii==1
the max is 1), and we use this random value to interpolate into the inverse sampled marginal CDF (inverse simply means swapping x and y). I needed to remove repeated values from marginal
because it becomes the x
in interp1
, and this function requires the x
values to be unique.
Finally, the function get_marginal
:
function marginal = get_marginal(F,r,ii,x,delta)
N = length(r);
marginal = zeros(size(x));
for jj=0:2^(ii-1)-1
rr = flip(dec2bin(jj,N)-'0');
sign = mod(sum(rr,2),2);
if sign == 0
sign = 1;
else
sign = -1;
end
args = num2cell(r - delta * rr);
args{ii} = x;
marginal = marginal + sign * F(args{:});
end
This contains quite a bit of complication. It samples the CDF along a given dimension ii
, at the points x
, given the fixed values r(1:ii-1)
.
The complication comes from computing partial derivatives. If we were to compute the marginal distribution for any one dimension without having picked any fixed values yet, we would simply do e.g.
marginal = F(inf,x,inf,inf,inf);
Having picked one value, we would do
marginal = F(r1,x,inf,inf,inf) - F(r1-delta,x,inf,inf,inf);
(this is the approximation to the partial derivative along the first dimension).
The code in get_marginal
does this for ii-1
fixed values. This requires sampling F
twice for each of these fixed values, as well as for each combination of delta
shifts, a total of n^2
times (for n
fixed values). The dec2bin
bit is to get all these combinations. sign
determines whether to add or subtract the given sample from the running total. args
is a cell array with the 5 arguments to the function F
, elements 1:ii-1
are the fixed values, element ii
is set to x
, and elements ii+1:N
are inf
.
Finally, I draw the marginal distributions of the data set values
(which contains the 1000 elements randomly drawn from the CDF), and overlay with the marginal distributions of the CDF, to verify all is correct:
lims = [-2,5];
x = linspace(lims(1),lims(2),300);
figure
for ii=1:5
subplot(5,1,ii)
histogram(values(:,ii),'normalization','cdf','BinLimits',lims)
hold on
args = num2cell(inf(1,5));
args{ii} = x;
plot(x,F(args{:}))
text(5.2,0.5,['r_',num2str(ii)])
end
Upvotes: 6