foglerit
foglerit

Reputation: 8269

MATLAB: Conditional summation

I have two arrays of the following form:

v1 = [ 1 2 3 4 5 6 7 8 9 ... ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c' ... }

(all values are examples only, no pattern can be assumed in the real data. v1 and c2 have the same size)

I want to obtain a vector containing the summation of the components of v1 corresponding to equal values in c2. In the example above, the first component of the resulting vector would be 1+2+3, the second 4+5, and so on.

I know I can do it in a loop of the form:

uni_c2 = unique(c2);
result = zeros(size(uni_c2));
for i = 1:numel(uni_c2)
     result(i) = sum( v1(strcmp(uni_c2(i),c2)) );
end 

Is there a single command or a vectorized way of doing the same operation?

Upvotes: 3

Views: 2135

Answers (3)

Carl F.
Carl F.

Reputation: 7046

You can do this in two lines:

[b, m, n] = unique(c2)
result = accumarray(n', v1)

The elements of result correspond to the strings in the cell array b.

Upvotes: 3

Egon
Egon

Reputation: 4787

I think a very general (and vectorized) solution is something like this:

v1 = [ 1 2 3 4 5 6 7 8 9  ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c'  }
uniqueValuesInC2 = unique(c2);
conditionalSumOfV1 = @(x)(sum(v1(strcmp(c2, x))));
result = cellfun(conditionalSumOfV1, uniqueValuesInC2)

Perhaps my solution needs a bit of an explanation to the untrained eye:

So first you actually need to compute the different possible values in c2, which is done by unique.

The conditionalSumOfV1 function takes an argument x, it compares every element in c2 with x and selects the corresponding elements in v1 and sums them.

Finally cellfun is comparable to a foreach construct in some other languages: the function conditionalSum is evaluated for every value in the cell array you provide (in this case: every unique value in c2) and stores it in the output array. For other types of container variables (arrays, structs), MATLAB has equivalent foreach-like constructs: arrayfun, structfun.

This will work for contents of c2 that are longer than a single character and it does not require a large repmat operation as stardt's solution. I do however have my doubts when it comes to long arrays where c2 has only a few duplicate values., but I guess that will be a hard case for most algorithms. If you are in such a case, you might need to take a look at the extra outputs of unique or write your own alternative to unique (i.e. write for loops, preferably in a compiled language/MEX).

Upvotes: 0

stardt
stardt

Reputation: 1219

This is vectorized but a bad idea for very large vectors. For some problems a "vectorized" solution is worse than a for loop.

>> v1 = [ 1 2 3 4 5 6 7 8 9];
>> c2 = 'aaabbcccc'-'a'
c2 =
   0   0   0   1   1   2   2   2   2
>> N = repmat(c2',1,max(c2)-min(c2)+1) == repmat([min(c2):max(c2)],size(c2,2),1);
>> v1*N
ans =
    6    9   30

Upvotes: 1

Related Questions