guskenny83
guskenny83

Reputation: 1373

Why is MATLAB's union function so slow?

If i want to find the union of two unordered sets, represented as 1D vectors, eg:

a = [2 4 6 8 1]
b = [1 2 5 7 9]

i can use the union function:

c = union(a,b)

which gives the answer:

c = [1 2 4 5 6 7 8 9]

however, this appears to be quite slow (relatively speaking). If i run a tic-toc test on it i get:

>> for test = 1
tic
c = union(a,b);
toc
end
Elapsed time is 0.000906 seconds.

whereas, if i use this much more convoluted method, i get a much quicker result:

>> for test = 1
tic
a_1 = zeros(1,9);
b_1 = zeros(1,9);
a_1(a) = 1;
b_1(b) = 1;
c_1 = or(a_1,b_1);
c = find(c_1);
toc
end
Elapsed time is 0.000100 seconds.

this still gives the same answer for c, but is about 9 times as fast (with this small example, im not sure about how well it scales)

what is the advantage of ever using union? and can anyone suggest a more compact way of expressing the second method that i used?

thanks

Upvotes: 1

Views: 601

Answers (1)

Matthew Gunn
Matthew Gunn

Reputation: 4519

Several comments:

  1. Your code isn't as general as union. You're assuming set members are strictly positive integers.
  2. Your code will blow up if a set member has a large value, eg. 10^20, as it tries to allocate an absurd quantity of memory.
  3. A bunch of Matlab functions that don't use BLAS/LAPACK or aren't built-in are actually quite slow. They're there for convenience but don't be shocked when you can roll something faster yourself, especially if you can specialize for your particular problem.
  4. Representing sets using logical arrays (which is what you do) can be highly efficient for problems where the set of possible set members is quite small.

Upvotes: 5

Related Questions