Reputation: 3803
Suppose that we have this matrix :
main = [10000 5 3 1;
5 5677 0 134;
1 1 456 3];
This method the most widely used method in econometrics and statistical problems.X
is our data that we're searching for outliers in it.
X-mean(X)>= n*std(X)
So If this Inequality was true, That sample is outlier otherwise We will keep the sample.
Now my question. I want find outliers with these codes:
meann = mean(main);
stdd = std(main);
out = find(main-repmat(meann,size(main,1),1)>=repmat(2*stdd,size(main,1),1));
We are searching outliers in every column. Out
should indicate index of outliers. In final step We should remove outliers in every column. Is any simpler function or method to do this in MAtLAB?
Thanks.
Upvotes: 2
Views: 11409
Reputation: 2993
Use a cell array if you want to remove certain elements from different columns.
main = rand(100,4);
main(10,1) = 10000;
main(40,2) = 4321;
main([10,20,30],3)=[938;10;4];
mu = num2cell(mean(main));
sig = num2cell(std(main));
m = num2cell(main,1);
ind = cellfun(@(x,m,s) find( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
m, mu, sig, 'uni', 0);
data = cellfun(@(x,m,s) x( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
m, mu, sig, 'uni', 0);
ps. your example is too small in size so there might be not enough samples to establish a threshold.
Upvotes: 2
Reputation: 45752
If you want to find 2 standard deviations away from the mean on a per column basis I would use bsxfun
rather than repmat
like this:
meann = mean(main)
stdd = std(main)
I = bsxfun(@gt, abs(bsxfun(@minus, main, meann)), 2*stdd)
I would stop at I
as this will allow you to remove outliers. However you can call find
it you like:
out = find(I)
Although to me it is more intuitive to do this:
I = bsxfun(@lt, meann + 2*stdd, main) | bsxfun(@gt, meann - 2*stdd, main)
I think your repmat
solution is missing an abs
btw
Upvotes: 3
Reputation: 1105
A 2*sigma criterion is certainly simple, but the mean and the standard deviation are really sensitive to outliers. It follows that the out
variable will thus be influenced, and in fact your code doesn't find any outlier in the given matrix.
To detect the outliers you can simply compare the values appearing in your matrix against the median, or adopt more refined criteria. There is a beautiful lecture explaining this at https://www.mne.psu.edu/me345/Lectures/outliers.pdf
Upvotes: 4