Reputation: 891
So say, I have a = [2 7 4 9 2 4 999]
And I'd like to remove 999 from the matrix (which is an obvious outlier).
Is there a general way to remove values like this? I have a set of vectors and not all of them have extreme values like that. prctile(a,99.5) is going to output the largest number in the vector no matter how extreme (or non-extreme) it is.
Upvotes: 4
Views: 18998
Reputation: 2201
Filter your signal.
%choose the value
N = 10;
filtered = filter(ones(1,N)/N, 1, signal);
Find the noise
noise = signal - filtered;
Remove noisy elements
THRESH = 50;
signal = signal(abs(noise) < THRESH);
It is better than mean+-n*stddev
approach because it looks for local changes so it won't fail on a slowly changing signal like [1 2 3 ... 998 998]
.
Upvotes: 1
Reputation: 26069
There are several way to do that, but first you must define what is "extreme'? Is it above some threshold? above some number of standard deviations?
Or, if you know you have exactly n
of these extreme events and that their values are larger than the rest, you can use sort
and the delete the last n
elements. etc...
For example a(a>threshold)=[]
will take care of a threshold like definition, while a(a>mean(a)+n*std(a))=[]
will take care of discarding values that are n
standard deviation above the mean of a
.
A completely different approach is to use the median of a
, if the vector is as short as you mention, you want to look on a median value and then you can either threshold anything above some factor of that value a(a>n*median(a))=[]
.
Last, a way to assess an approach to treat these spikes would be to take a histogram of the data, and work from there...
Upvotes: 11
Reputation: 977
I can think of two:
mean +/- (n * standard deviation)
In both cases n must be chosen by the user.
Upvotes: 2