Reputation: 349
Consider 2 Vectors A = [20000000 x 1]
and B = [20000000 x 1 ]
I would need to find the sum of all A corresponding to every unique element of B.
Although this looks really easy, this is taking forever in MATLAB.
Currently, I am using
u = unique(B);
length_u = length(u);
C = zeros(length_u,1);
for i = 1:length_u
C(i,1) = sum(A(B==u(i)));
end
Is there anyway to make it run faster? I tried splitting the loop and running 2 parfor
loops using the parallel computing toolbox(because I have only 2 cores). Still takes hours.
P.S: Yes, I should get a better computer.
Upvotes: 7
Views: 272
Reputation: 306
Further simplification of code suggested by Shai:
A = randi( 500, 1, 100000 );
B = randi( 500, 1, 100000 );
[~,~,idb] = unique( B );
C = accumarray( idb', A' )';
The "idb"
here gives a vector same as that of "idx"
in code suggested by Shai.
Upvotes: 3
Reputation: 114796
You must see this answer first.
If you must, you can use a combination of histc
and accumarray
A = randi( 500, 1, 100000 );
B = randi( 500, 1, 100000 );
ub = unique( B );
[ignore idx] = histc( B, [ub-.5 ub(end)+.5] );
C = accumarray( idx', A' )';
see a toy comparison to the naive for
-loop implementation on ideone.
We use the second outout of histc
to map elements of B
(and later A
) to the bins defined by the elements of ub
(the unique elements of B
).
accumarray
is then used to sum all entries of A
accorind to the mapping defined by idx
.
Note: I assume the unique elements of B
are at least 0.5 apart.
Upvotes: 6
Reputation: 1164
I modified the sum. Instead of having to check each element rather it fits the case (B==u(i)
) or not, I sorted the array and stopped the moment the element changed. While starting the next sum from that element. That way I only had to loop over each element in A ones, instead of length_u
times. Here is the code I used:
A= rand(100000,1);
B= round(rand(100000,1)*25000);
u = unique(B);
length_u = length(u);
C = zeros(length_u,1);
E = zeros(length_u,1);
tic;
for k = 1:length_u
C(k,1) = sum(A(B==u(k)));
end
t_OP=toc;
tic
D= sortrows([A,B],2);
n=1;
for l=1:numel(u)
m=n;
while m<numel(B) && D(m+1,2)==u(l)
m=m+1;
end
E(l,1) = sum(D(n:m,1));
n=m+1;
end
t_trial=toc;
display(t_OP)
display(t_trial)
I used your code as well. The elapsed time for your code was: t_OP=10.9398
and for my modification: t_trial=0.0962
. Hopefully this helps. I made sure the code worked by building sum(E-C)
which was 0
.
EDIT: Speedtest
I compared it to @Shai's solution as well. THis resulted in
t_OP =
10.8147
t_trial =
0.0984
t_Shai =
0.0154
EDIT: Comment by @moarningsun
Instead of using the while
-loop. You can use the second output of unique if you sort your array before building the sum.
tic
A = randi( 25000, 1, 100000 );
B = randi( 25000, 1, 100000 );
D= sortrows([A',B'],2);
[u, idx] = unique(D(:,2));
idx = [idx; numel(D(:,2))+1];
for l=1:numel(u)
E(l,1) = sum(D(idx(l):idx(l+1)-1,1));
end
t_trial=toc;
Upvotes: 1
Reputation: 112679
If B
contains only integers, you can do it easily in one line, using the fact that sparse
adds elements with the same index:
C = nonzeros(sparse(B,1,A));
Upvotes: 3