Reputation: 8269
I have two arrays of the following form:
v1 = [ 1 2 3 4 5 6 7 8 9 ... ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c' ... }
(all values are examples only, no pattern can be assumed in the real data. v1
and c2
have the same size)
I want to obtain a vector containing the summation of the components of v1
corresponding to equal values in c2
. In the example above, the first component of the resulting vector would be 1+2+3
, the second 4+5
, and so on.
I know I can do it in a loop of the form:
uni_c2 = unique(c2);
result = zeros(size(uni_c2));
for i = 1:numel(uni_c2)
result(i) = sum( v1(strcmp(uni_c2(i),c2)) );
end
Is there a single command or a vectorized way of doing the same operation?
Upvotes: 3
Views: 2135
Reputation: 7046
You can do this in two lines:
[b, m, n] = unique(c2)
result = accumarray(n', v1)
The elements of result correspond to the strings in the cell array b.
Upvotes: 3
Reputation: 4787
I think a very general (and vectorized) solution is something like this:
v1 = [ 1 2 3 4 5 6 7 8 9 ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c' }
uniqueValuesInC2 = unique(c2);
conditionalSumOfV1 = @(x)(sum(v1(strcmp(c2, x))));
result = cellfun(conditionalSumOfV1, uniqueValuesInC2)
Perhaps my solution needs a bit of an explanation to the untrained eye:
So first you actually need to compute the different possible values in c2
, which is done by unique
.
The conditionalSumOfV1
function takes an argument x
, it compares every element in c2
with x
and selects the corresponding elements in v1
and sums them.
Finally cellfun
is comparable to a foreach
construct in some other languages: the function conditionalSum
is evaluated for every value in the cell array you provide (in this case: every unique value in c2
) and stores it in the output array. For other types of container variables (arrays, structs), MATLAB has equivalent foreach
-like constructs: arrayfun
, structfun
.
This will work for contents of c2
that are longer than a single character and it does not require a large repmat
operation as stardt's solution. I do however have my doubts when it comes to long arrays where c2
has only a few duplicate values., but I guess that will be a hard case for most algorithms. If you are in such a case, you might need to take a look at the extra outputs of unique
or write your own alternative to unique
(i.e. write for
loops, preferably in a compiled language/MEX).
Upvotes: 0
Reputation: 1219
This is vectorized but a bad idea for very large vectors. For some problems a "vectorized" solution is worse than a for
loop.
>> v1 = [ 1 2 3 4 5 6 7 8 9];
>> c2 = 'aaabbcccc'-'a'
c2 =
0 0 0 1 1 2 2 2 2
>> N = repmat(c2',1,max(c2)-min(c2)+1) == repmat([min(c2):max(c2)],size(c2,2),1);
>> v1*N
ans =
6 9 30
Upvotes: 1