gnychis
gnychis

Reputation: 7555

a faster way to achieve what intersect() is giving me?

I am finding that a lot of time spent in my matlab function is in this code:

intersect(freq_bins, our_bins);

Both can be rather large vectors, and are comprised of only integers. I just need to know which integers are in both. This is truly the primitive purpose of intersect(), so I suspect that the answer is: it doesn't get any better. But maybe someone has some suggestions.

Upvotes: 7

Views: 3893

Answers (2)

Doug
Doug

Reputation: 1466

If you can assume that your inputs contain sorted lists of unique integers, then you can do this in linear time with a very simple algorithm:

function c = intersect_sorted(a,b)
  ia = 1;
  na = length(a);
  ib = 1;
  nb = length(b);
  ic = 0;
  cn = min(na,nb);
  c = zeros(1,cn);

  while (ia <= na && ib <= nb)
    if (a(ia) > b(ib))
      ib = ib + 1;
    elseif a(ia) < b(ib)
      ia = ia + 1;
    else % a(ia) == b(ib)
      ic = ic + 1;
      c(ic) = a(ia);
      ib = ib + 1;
      ia = ia + 1;
    end
  end
  c = c(1:ic);
end

The max runtime for lists of length n and m will be O(n+m).

>>a = unique(randi(1000,100,1));
>>b = unique(randi(1000,100,1));
>>tic;for i = 1:10000, intersect(a,b); end,toc
Elapsed time is 1.224514 seconds.
>> tic;for i = 1:10000, intersect_sorted(a,b); end,toc
Elapsed time is 0.289075 seconds.

Upvotes: 0

Jonas
Jonas

Reputation: 74940

intersect calls ismember. In your case, you don't need all the complicated checks that intersect does, so you can save some overhead and call ismember directly (note: I made sure to call both functions before timing them):

a = randi(1000,100,1);
b = randi(1000,100,1);

>> tic,intersect(a,b),toc
ans =
    76
   338
   490
   548
   550
   801
   914
   930
Elapsed time is 0.027104 seconds.

>> tic,a(ismember(a,b)),toc
ans =
   914
   801
   490
   548
   930
   550
    76
   338
Elapsed time is 0.000613 seconds.

You can make this even faster by calling ismembc, the function that does the actual testing, directly. Note that ismembc requires sorted arrays (so you can drop the sort if your input is sorted already!)

tic,a=sort(a);b=sort(b);a(ismembc(a,b)),toc
ans =
    76
   338
   490
   548
   550
   801
   914
   930
Elapsed time is 0.000473 seconds.

Upvotes: 10

Related Questions