enigmae
enigmae

Reputation: 349

Accelerating Iterations- MATLAB

Consider 2 Vectors A = [20000000 x 1] and B = [20000000 x 1 ]

I would need to find the sum of all A corresponding to every unique element of B.

Although this looks really easy, this is taking forever in MATLAB.

Currently, I am using

u = unique(B);
length_u = length(u);
C = zeros(length_u,1);

for i = 1:length_u
   C(i,1) = sum(A(B==u(i)));
end

Is there anyway to make it run faster? I tried splitting the loop and running 2 parfor loops using the parallel computing toolbox(because I have only 2 cores). Still takes hours.

P.S: Yes, I should get a better computer.

Upvotes: 7

Views: 272

Answers (4)

Naveen
Naveen

Reputation: 306

Further simplification of code suggested by Shai:

A = randi( 500, 1, 100000 );
B = randi( 500, 1, 100000 );

[~,~,idb] = unique( B );

C = accumarray( idb', A' )';

The "idb" here gives a vector same as that of "idx" in code suggested by Shai.

Upvotes: 3

Shai
Shai

Reputation: 114796

You must see this answer first.
If you must, you can use a combination of histc and accumarray

A = randi( 500, 1, 100000 );
B = randi( 500, 1, 100000 );

ub = unique( B );

[ignore idx] = histc( B, [ub-.5 ub(end)+.5] );
C = accumarray( idx', A' )';

see a toy comparison to the naive for-loop implementation on ideone.

How does it work?

We use the second outout of histc to map elements of B (and later A) to the bins defined by the elements of ub (the unique elements of B).
accumarray is then used to sum all entries of A accorind to the mapping defined by idx.
Note: I assume the unique elements of B are at least 0.5 apart.

Upvotes: 6

The Minion
The Minion

Reputation: 1164

I modified the sum. Instead of having to check each element rather it fits the case (B==u(i)) or not, I sorted the array and stopped the moment the element changed. While starting the next sum from that element. That way I only had to loop over each element in A ones, instead of length_u times. Here is the code I used:

A= rand(100000,1);
B= round(rand(100000,1)*25000);
u = unique(B);
length_u = length(u);
C = zeros(length_u,1);
E = zeros(length_u,1);
tic;
for k = 1:length_u
   C(k,1) = sum(A(B==u(k)));
end
t_OP=toc;

tic
D= sortrows([A,B],2);
n=1;
for l=1:numel(u)
    m=n;
    while m<numel(B) && D(m+1,2)==u(l) 
        m=m+1;
    end
    E(l,1) = sum(D(n:m,1));
    n=m+1;
end
t_trial=toc;
display(t_OP)
display(t_trial)

I used your code as well. The elapsed time for your code was: t_OP=10.9398 and for my modification: t_trial=0.0962. Hopefully this helps. I made sure the code worked by building sum(E-C) which was 0.
EDIT: Speedtest
I compared it to @Shai's solution as well. THis resulted in

t_OP =

   10.8147
t_trial =

    0.0984
t_Shai =

    0.0154


EDIT: Comment by @moarningsun
Instead of using the while-loop. You can use the second output of unique if you sort your array before building the sum.

tic
A = randi( 25000, 1, 100000 );
B = randi( 25000, 1, 100000 );
D= sortrows([A',B'],2);
[u, idx] = unique(D(:,2));
idx = [idx; numel(D(:,2))+1];
for l=1:numel(u)
    E(l,1) = sum(D(idx(l):idx(l+1)-1,1));
end
t_trial=toc;

Upvotes: 1

Luis Mendo
Luis Mendo

Reputation: 112679

If B contains only integers, you can do it easily in one line, using the fact that sparse adds elements with the same index:

C = nonzeros(sparse(B,1,A));

Upvotes: 3

Related Questions