David K
David K

Reputation: 1346

Combine values corresponding to repeated adjacent values in vector

I have to vectors of data that look something like this:

A = [1 2 3 3 4 5 6 6 5 4 4 3 3 3 3];
B = [1 5 9 6 4 6 8 2 1 5 7 8 3 2 6];

I would like to remove all repeated adjacent values in A and sum the corresponding values in B, with result being

A = [1 2 3  4 5 6  5 4  3];
B = [1 5 15 4 6 10 1 12 19];

I could use unique as described in this answer, but that would combine all repeated values, duplicate values, regardless of order. I could also use diff, as described in this question, but I don't know how to record the indices that would be combined.

I could always just iterate through the vector, but that seems needlessly tedious and I feel there should be a more elegant solution. Is there a way to achieve this in just a couple lines?

Upvotes: 3

Views: 67

Answers (1)

rayryeng
rayryeng

Reputation: 104464

You could use diff to first find neighbouring locations that are not unique, then combine this with cumsum so that you can generate the different groups that should belong to each other. Finding any values in the difference result that are non-zero will find those values that are non-unique but consecutive. When you apply cumsum to this result, you will generate an ID array that varies from 1 up to as many groups where all values that belong to the same ID belong to the same consecutive group. This should serve as an ideal input into accumarray where we can sum all of the values that belong to each group:

Aval = A(:); % Unroll into a column to ensure shape compliance
ind = diff([Inf; Aval]) ~= 0; % Find all unique locations
IDs = cumsum(ind); % Create ID array
Aout = Aval(ind).'; % Determine all unique values per group
Bout = accumarray(IDs(:), B(:)).'; % Find their sum

I will admit that this is not in a couple of lines as most of it is setup, but the core answer is seen in the second, third and last line of code. Notice the subtlety with accumarray where the inputs are required to be column vectors. To enforce the inputs so that they're column vectors, I use (:) to unroll the vectors into columns regardless of their shape, especially with the first line of code. I then transpose the result in the end as accumarray will output a column vector in this case and transposing will create a row vector, as you would like a row vector as the desired result.

For your test input:

A = [1 2 3 3 4 5 6 6 5 4 4 3 3 3 3];
B = [1 5 9 6 4 6 8 2 1 5 7 8 3 2 6];

The output of the diff result gives:

>> ind.'

ind =

     1     1     1     0     1     1     1     0     1     1     0     1     0     0     0

You can precisely see that values that are zero correspond to non-unique consecutive positions. The output of the ID array once you run cumsum gives:

>> IDs.'

IDs =

     1     2     3     3     4     5     6     6     7     8     8     9     9     9     9

Performing cumsum on the IDs array transforms this diff array so that each consecutive group gives you a unique ID. We can also use ind to index into A to find those unique values per group which is the third line. The last line sums over each group. Note that the third line is transposed to become a row vector as I unrolled the data so that it's a column vector to begin with.

We get the desired output:

>> Aout

Aout =

     1     2     3     4     5     6     5     4     3

>> Bout

Bout =

     1     5    15     4     6    10     1    12    19

Upvotes: 7

Related Questions