Jurriën
Jurriën

Reputation: 309

Multivariate Normal Distribution Matlab, probability area

I have 2 arrays: one with x-coordinates, the other with y-coordinates. Both are a normal distribution as a result of a Monte-Carlo simulation. I know how to find the sigma and mu for both array's, and get a 95% confidence interval:

[mu,sigma]=normfit(x_array);
hist(x_array);
x=norminv([0.025 0.975],mu,sigma)

However, both array's are correlated with each other. To plot the probability distribution of the combined array's, i use the multivariate normal distribution. In MATLAB this gives me:

[MuX,SigmaX]=normfit(x_array);
[MuY,SigmaY]=normfit(y_array);
mu = [MuX MuY];
Sigma=cov(x_array,y_array);
x1 = MuX-4*SigmaX:5:MuX+4*SigmaX; x2 = MuY-4*SigmaY:5:MuY+4*SigmaY;
[X1,X2] = meshgrid(x1,x2);
F = mvnpdf([X1(:) X2(:)],mu,Sigma);
F = reshape(F,length(x2),length(x1));
surf(x1,x2,F);
caxis([min(F(:))-.5*range(F(:)),max(F(:))]);
set(gca,'Ydir','reverse')
xlabel('x0-as'); ylabel('y0-as'); zlabel('Probability Density');

enter image description here

So far so good. Now I want to calculate the 95% probability area. I'am looking for a function as mndinv, just as norminv. However, such a function doesn't exist in MATLAB, which makes sense because there are endless possibilities... Does somebody have a tip about how to get a 95% probability area? Thanks in advance.

Upvotes: 6

Views: 2228

Answers (2)

Oleg
Oleg

Reputation: 10676

For the bivariate case you can add the ellispe whose area corresponds to NORMINV(95%). This ellipse is uniquely identified and for proof see the first source in the link.

% Suppose you know the distribution params, or you got them from normfit()
mu    = [3, 7];
sigma = [1, 2.5
         2.5  9];

% X/Y values for plotting grid
x = linspace(mu(1)-3*sqrt(sigma(1)), mu(1)+3*sqrt(sigma(1)),100);
y = linspace(mu(2)-3*sqrt(sigma(end)), mu(2)+3*sqrt(sigma(end)),100);

% Z values
[X1,X2] = meshgrid(x,y);
Z       = mvnpdf([X1(:) X2(:)],mu,sigma);
Z       = reshape(Z,length(y),length(x));

% Plot
h = pcolor(x,y,Z);
set(h,'LineStyle','none')
hold on

% Add level set
alpha = 0.05;
r     = sqrt(-2*log(alpha));
rho   = sigma(2)/sqrt(sigma(1)*sigma(end));
M     = [sqrt(sigma(1)) rho*sqrt(sigma(end))
         0              sqrt(sigma(end)-sigma(end)*rho^2)];


theta = 0:0.1:2*pi;
f     = bsxfun(@plus, r*[cos(theta)', sin(theta)']*M, mu);
plot(f(:,1), f(:,2),'--r')

enter image description here

Sources

Upvotes: 4

Ander Biguri
Ander Biguri

Reputation: 35525

To get the numerical value of F where the top part lies, you should use top5=prctile(F(:),95) . This will return the value of F that limits the bottom 95% of data with the top 5%.

Then you can get just the top 5% with

Ftop=zeros(size(F));
Ftop=F>top5;
Ftop=Ftop.*F;
%// optional: Ftop(Ftop==0)=NaN;
surf(x1,x2,Ftop,'LineStyle','none');

Upvotes: 1

Related Questions