tgv
tgv

Reputation: 3

eliminating noise/spikes

I have a measurement data with similar positive and negative values which should be like:

ReqData=[0 0 -2 -2 -2 -2 -2 -2 0 0 0 -2 -2 -2 -2 0 0 2 2 2 2 2 2 0 0 2 2 2 2 2 0 0 2 2 2 2 2 0 0 2 2 2 0 0]'

However, there are some measurement noises in the data - so the real data is like this:

RealData=[0 0 -2 -2 -2 -2 -2 -2 0 0 0 -2 -2 -2 -2 0 0 2 2 2 2 -4 -1 0 0 2 2 2 2 -7 0 0 2 2 2 2 -1 0 0 2 2 2 0 0]'
  1. How do I remove the end noise from the RealData and convert it into ReqData using Matlab?
  2. How do I find the start and stop indexes of each set of positive or negative data and split them using Matlab? For instance, ansPositive = [3,8, 12, 15]' and ansNegative = [18, 23, 26, 30, 33, 37, 40, 42]'.

Upvotes: 0

Views: 2677

Answers (3)

Egon
Egon

Reputation: 4787

As you mailed me another data set which is quite different from the one you posted here, I will explain another method to deal with your data.

For completeness, you can find a detail of the data set in the figure below:

New dataset

On the left you see the complete data set, and on the right is a detail. In contrast to the previous data set, we see that each peak is not at a constant level and we also do not need to interpolate in a nearest-neighbor sense as was the case before.

First of all, my previous answer works very slowly on the complete data set (so bad coding from my part), but it will most likely work badly as all peaks might not be projected to the right value (e.g. image the most frequently used values are 4, 4.1 and 0 (followed by -4.05 and others)). That would cause my previous algorithm to fail.

To circumvent this, it's quite easy to choose two threshold levels for which we build a predictor: everything larger than a positive threshold is regarded as constant positive value, everything smaller than a negative threshold is regarded as a negative constant and everything in between is regarded as zero.

By selecting decent thresholds, you can get quite a robust reconstruction: Reconstructed signal

You can see the reconstruction in green, while the thresholds are displayed as dashed red lines. It should be possible to automatically select those thresholds depending on the actual data; but I leave that up to you (take a look at my previous code to get an idea how you can tackle this).

The corresponding source code:

thresholdNeg = -3;
idxNeg   = RealData<thresholdNeg;
valueNeg = mean(RealData(idxNeg));

thresholdPos = 3;
idxPos   = RealData>thresholdPos;
valuePos = mean(RealData(idxPos));

reconData         = zeros(size(RealData));
reconData(idxPos) = valuePos;
reconData(idxNeg) = valueNeg;

n = numel(reconData);

plot(RealData,'b'); hold on;
plot(reconData,'-gx'); 
plot([1 n NaN n 1],[thresholdPos thresholdPos NaN thresholdNeg thresholdNeg], 'r--');

edit: If you want to retain any information contained at the high and low levels of the signal (if you zoom in on the approximately constant levels, you can notice the presence of another signal), you can use an inverse thresholding technique: retain the signal everywhere but put the signal to 0 where the signal is between the thresholds.

Upvotes: 1

Egon
Egon

Reputation: 4787

You can use something like the following code to reconstruct your data.

It will first determine what amplitudes occur the most frequently. It assumes that the 3 most frequent amplitudes are the correct ones, you can always impose slightly different constraints (e.g. check whether two of them have the same absolute value and always include).

Then it finds the sample points where the signals has a different amplitude and corrects it to the previous amplitude of the signal.

clc; clear all; close all;

ReqData=[0 0 -2 -2 -2 -2 -2 -2 0 0 0 -2 -2 -2 -2 0 0 2 2 2 2 2 2 0 0 2 2 2 2 2 0 0 2 2 2 2 2 0 0 2 2 2 0 0]';
RealData=[0 0 -2 -2 -2 -2 -2 -2 0 0 0 -2 -2 -2 -2 0 0 2 2 2 2 -4 -1 0 0 2 2 2 2 -7 0 0 2 2 2 2 -1 0 0 2 2 2 0 0]';

ReconData = RealData;

amplitudes = unique(RealData);
histogram = hist(RealData,amplitudes);
[histogram, sorted] = sort(histogram);
amplitudes = amplitudes(sorted);

allowedValues = amplitudes(end-2:end);
%allowedValues = [-1 0 1] * 2;

spikes = find(arrayfun(@(x) (~ismember(x,allowedValues)),RealData));
for iSpike = 1:numel(spikes)
    jSpike = spikes(iSpike);
    ReconData(jSpike) = ReconData(jSpike-1);
end

plot(ReqData,'-or'); hold on;
plot(RealData,'b');
plot(ReconData,'-gx');

Upvotes: 0

Oli
Oli

Reputation: 16045

It depends how noisy are your RealData, here it's a little bit confusing. For instance RealData(16) is negative but ReqData(16) is positive, what ouput do you want in that case?

I would do:

 RealDataPos=double(RealData'>0);
 RealDataPosBeginning=find(conv(RealDataPos,[-1 1 0],'same')>0);
 RealDataPosEnd=find(conv(RealDataPos,[0 1 -1],'same')>0);

 RealDataNeg=double(RealData'<0);
 RealDataNegBeginning=find(conv(RealDataNeg,[-1 1 0],'same')>0);
 RealDataNegEnd=find(conv(RealDataNeg,[0 1 -1],'same')>0);

PS: Comment if you want something more complicated that handle the fact that sometime the en of a positive sequence in reaData becomes negative.

Upvotes: 0

Related Questions