Reputation: 35
Beginning from a covariance matrix (3x3 COVAR) and corresponding mean values (3x1 OMEGA), I can generate contour plots for the credible regions (1-sigma and 2-sigma). Doing so, I incur negative values (estimates) too, which I need to avoid.
In other words, I can sample from the above-mentioned multivariate distribution as
samples = mvnrnd(OMEGA,COVAR,10000);
%% it will result in 10000x3 data set.
Now from this sample, I would like to remove the entire row containing a negative value. I need all the entry in a particular row to be positive only. I thought of two ways to get rid of these negative values: 1) Simply eliminate the row containing a single negative entry OR 2) Replace all the negative values by 0.0 Below is my code...
This is how I am trying...
samples = mvnrnd(OMEGA,COVAR,100000);
flag=1;
for iRow=1:length(samples)
a=samples(iRow,:);
% if (~any(a<0)&&flag==1)
% flag=0;
% data=a;
% elseif (~any(a<0)&&flag==0) %eliminating row
% data=[data;a];
% end
if flag==1
flag=0;a(a<0)=0;data=a; % replacing by 0.0
else
a(a<0)=0;data=[data;a];
end
end
After executing this piece of code I compute again the mean and covariance of the data, BUT ONLY TO FIND THAT THEY DIFFER A LOT...
% new mean and new covariance
OMEGA = mean(data);
COVAR = cov(data);
Could anybody suggest me a better method to do the above? The idea is to reduce the magnitude of the covariance matrix (errors) by getting rid of negative values (unphysical). Then one can plot the contour plot from these new values,(mean and cov mat) and restrict to the first quadrant only, (ie. ONLY POSITIVE VALUES OF THE PARAMETERS).
Thank you in advance.
Upvotes: 0
Views: 233
Reputation: 991
The mean and variance of the changed data-set will differ from the original because you are removing the values that were originally part of the distribution. Say you have a sequence of numbers like -4, -5, 2, 8, 9, with a mean of 2. Now if you remove the negative parts completely, the mean will become 19/3, and if you replace the negative values by zero, the mean will become 19/5. This is what is happening in your case too.
However, you can minimize the difference between the values by your 2nd approach of removing the negative values by making them 0. Your code works correctly for both the cases. But, you might want to use the code below as it will be faster and more optimized:
Approach 1 (Removing the negative rows completely)
%Turns out there is a 1 line solution to this as well
data = samples(~any(samples < 0, 2), :);
%Define an anonymous function that finds out which rows have a negative values
%fn = @(x) any(samples(x, :) < 0);
%Now run this function over all the rows of the samples
%neg_rows = arrayfun(fn, 1:size(samples, 1));
%Change the dataset to remove the negative rows
%data = samples(~neg_rows, :);
Approach 2 (Making all the negative values 0)
data = samples; % Copies all the samples to data
data(data < 0) = 0; % Makes all the negative values in data 0
Now you can compare the mean and co-variance of the new data-sets to see which approach leads to the best results.
Upvotes: 1