Hooplator15
Hooplator15

Reputation: 1550

How to Combine groups of duplicate and repeating values and preserving order? Matlab

I am having trouble combining repeated elements of my Matlab "data" variable. I can easily combine the values using unique and sort.

[sorted,idx] = sort(data);
[~,ij] = unique(sorted,'first');
Indx = (sort(idx(ij)));

However, by doing this I am combining ALL repeated values. What I really want to do is combine only groups of repeating elements. For example take this:

data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;]

Combine duplicate groups of elements:

data = [1;2;3;4;3;2;1;4;]

I need to combine the groups of repeating elements wile still preserving the order. It would also be helpful to return the index because I need to average data in another variable based on the index of combination.

For example:

data  = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;]
data2 = [7;2;4;5;3;4;6;8;5;3;5;7;4;2;4;6;8;4;3;6;7;8;4;2;9;3;2;0;]

dataCombined = [1;     2;  3;    4;   3;  2;     1;     4;   ]
data2average = [4.33;  4;  6.33  4.2  5;  5.25;  5.25;  3.5; ]

Can anyone give suggestions?


SOLUTION:

Thank you all for your answers. MZimmerman6's solution worked well for me. I wanted to show what I did in order to average the values in "data2" array.

data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;];
data2 = [7;2;4;5;3;4;6;8;5;3;5;7;4;2;4;6;8;4;3;6;7;8;4;2;9;3;2;0;];
change = diff(data)~=0;
indices = [1,find(change)'+1];
compressed = data(indices)';


numberOfRepeatingGroups = size(indices);


for i=1:numberOfRepeatingGroups(1,2)


  if(i == 1)   

      dataToAverage = data2(indices(1,1):(indices(1,2)-1));

  elseif (i == numberOfRepeatingGroups(1,2))

       dataToAverage = data2(indices(1,i):end);

  else

       dataToAverage = data2(indices(1,i):(indices(1,(i+1))-1));

  end

       data2Averaged(1,i) = mean(dataToAverage(:));

end   


data2Averaged =

4.3333    4.0000    6.3333    4.2000    5.0000    5.2500    5.2500    3.5000

Upvotes: 3

Views: 1282

Answers (2)

Oleg
Oleg

Reputation: 10676

I will never stop recommending this run-length encoding/deconding utility from the File Exchange: rude().

% Run-length encode preserving order
[len,val] = rude(data);
len =
     3     3     3     5     2     4     4     4
val =
     1     2     3     4     3     2     1     4

Now, to calculate the mean, first re-label each subsequence with rude(), then use accumarray()

% Decode and re-label each subsequence
subs = rude(len,1:numel(len))';

% Take average on each re-labelled subsequence
accumarray(subs,data2,[],@mean)
ans =
    4.3333
    4.0000
    6.3333
    4.2000
    5.0000
    5.2500
    5.2500
    3.5000

Upvotes: 2

MZimmerman6
MZimmerman6

Reputation: 8603

You can use a derivative to find fluctuations in your data arrays, which would indicate a change in grouping. Anywhere where the derivative is not 0, there is a change, either positive or negative. Find where these changes occur, and then grab the corresponding indices. Something like below.

data = [1;1;1;2;2;2;3;3;3;4;4;4;4;4;3;3;2;2;2;2;1;1;1;1;4;4;4;4;];
change = diff(data)~=0;
indices = [1,find(change)'+1];
compressed = data(indices)';

and the result will be

compressed =
     1     2     3     4     3     2     1     4

And of course you can use the indices variable for whatever you need as well.

Note On the third line, we add index 1 because technically the start of the array is a change, and then we add 1 to the find command because we are using find on the derivative, so the returned change array will be 1 shorter than the original.

Upvotes: 4

Related Questions