abdfahim
abdfahim

Reputation: 2553

Conditional Sum in Array

I have 2 arrays, A and B. I want to form a new array C with same dimension as B where each element will show SUM(A) for A > B

Below is my working code

A = [1:1:1000]
B=[1:1:100]
for n = 1:numel(B)
    C(n) = sum(A(A>B(n)));
end

However, when A has millions of rows and B has thousands, and I have to do similar calculations for 20 array-couples,it takes insane amount of time.

Is there any faster way?

For example, histcounts is pretty fast, but it counts, rather than summing.

Thanks

Upvotes: 7

Views: 365

Answers (3)

Divakar
Divakar

Reputation: 221524

You had the right ideas with histcounts, as you are basically "accumulating" certain A elements based on binning. This binning operation could be done with histc. Listed in this post is a solution that starts off with similar steps as listed in @David's answer and then uses histc to bin and sum up selective elements from A to get us the desired output and all of it in a vectorized manner. Here's the implementation -

%// Sort A and B and also get sorted B indices
sA = sort(A);
[sB,sortedB_idx] = sort(B);

[~,bin] = histc(sB,sA);     %// Bin sorted B onto sorted A
C_out = zeros(1,numel(B));  %// Setup output array

%// Take care of the case when all elements in B are greater than A  
if sA(1) > sB(end)
    C_out(:) = sum(A);
end

%// Only do further processing if there is at least one element in B > any element in A
if any(bin)
    csA = cumsum(sA,'reverse'); %// Reverse cumsum on sorted A

    %// Get sum(A(A>B(n))) for every n, but for sorted versions
    valid_mask = cummax(bin) - bin ==0;
    valid_mask2 = bin(valid_mask)+1 <= numel(A);
    valid_mask(1:numel(valid_mask2)) = valid_mask2;
    C_out(valid_mask) = csA(bin(valid_mask)+1);

    %// Rearrange C_out to get back in original unsorted version
    [~,idx] = sort(sortedB_idx);
    C_out = C_out(idx);
end

Also, please remember when comparing the result from this method with the one from the original for-loop version that there would be slight variations in output as this vectorized solution uses cumsum which computes a running summation and as such would have large cumulatively summed numbers being added to individual elements that are comparatively very small, whereas the for-loop version would sum only selective elements. So, floating-precision issues would come up there.

Upvotes: 5

GJStein
GJStein

Reputation: 658

Depending on the size of your arrays (and your memory limitations), the following code might be slightly faster:

C = A*bsxfun(@gt,A',B);

Though it's vectorized, however, it seems to be bottlenecked (perhaps) by the allocation of memory. I'm looking to see if I can get a further speedup. Depending on your input vector size, I've seen up to a factor of 2 speedup for large vectors.

Upvotes: 7

David
David

Reputation: 8459

Here's a method that is a bit quicker, but I'm sure there is a better way to solve this problem.

a=sort(A); %// If A and B are already sorted then this isn't necessary!
b=sort(B);
c(numel(B))=0; %// Initialise c
s=cumsum(a,2,'reverse'); %// Get the partial sums of a
for n=1:numel(B)
    %// Pull out the sum for elements in a larger than b(n)
    c(n)=s(find(a>b(n),1,'first'));
end

According to some very rough tests, this seems to run a bit better than twice as fast as the original method.

Upvotes: 6

Related Questions