Reputation: 7555
I am finding that a lot of time spent in my matlab function is in this code:
intersect(freq_bins, our_bins);
Both can be rather large vectors, and are comprised of only integers. I just need to know which integers are in both. This is truly the primitive purpose of intersect(), so I suspect that the answer is: it doesn't get any better. But maybe someone has some suggestions.
Upvotes: 7
Views: 3893
Reputation: 1466
If you can assume that your inputs contain sorted lists of unique integers, then you can do this in linear time with a very simple algorithm:
function c = intersect_sorted(a,b)
ia = 1;
na = length(a);
ib = 1;
nb = length(b);
ic = 0;
cn = min(na,nb);
c = zeros(1,cn);
while (ia <= na && ib <= nb)
if (a(ia) > b(ib))
ib = ib + 1;
elseif a(ia) < b(ib)
ia = ia + 1;
else % a(ia) == b(ib)
ic = ic + 1;
c(ic) = a(ia);
ib = ib + 1;
ia = ia + 1;
end
end
c = c(1:ic);
end
The max runtime for lists of length n and m will be O(n+m).
>>a = unique(randi(1000,100,1));
>>b = unique(randi(1000,100,1));
>>tic;for i = 1:10000, intersect(a,b); end,toc
Elapsed time is 1.224514 seconds.
>> tic;for i = 1:10000, intersect_sorted(a,b); end,toc
Elapsed time is 0.289075 seconds.
Upvotes: 0
Reputation: 74940
intersect
calls ismember
. In your case, you don't need all the complicated checks that intersect
does, so you can save some overhead and call ismember
directly (note: I made sure to call both functions before timing them):
a = randi(1000,100,1);
b = randi(1000,100,1);
>> tic,intersect(a,b),toc
ans =
76
338
490
548
550
801
914
930
Elapsed time is 0.027104 seconds.
>> tic,a(ismember(a,b)),toc
ans =
914
801
490
548
930
550
76
338
Elapsed time is 0.000613 seconds.
You can make this even faster by calling ismembc
, the function that does the actual testing, directly. Note that ismembc
requires sorted arrays (so you can drop the sort if your input is sorted already!)
tic,a=sort(a);b=sort(b);a(ismembc(a,b)),toc
ans =
76
338
490
548
550
801
914
930
Elapsed time is 0.000473 seconds.
Upvotes: 10