Kareem Ergawy
Kareem Ergawy

Reputation: 677

Efficiently calculating pairwise intersection between cell array elements in Matlab

Suppose you have 2 cell arrays:

tri1 (4x1 cell)

[1; 3; 4]
[2; 4]
[1; 2; 3]
[1; 3; 4]

and tri2 (4x1 cell)

[1; 2; 3]
[1; 2; 3]
[1; 3; 4]
[2; 4]

and you want to calculate the intersection between each 2 corresponding cell elements. In this example, the result would be something like this:

tri12 (4x1 cell)

[1; 3]
2
[1; 3]
4

Currently I calculate this intersection using cellfun like this:

tri12 = cellfun(@(t1, t2){intersect(t1, t2)}, tri1, tri2);

However, this is extremely slow when the cell arrays are large.

My question, is there a faster way to calculate the same intersection result without having to use cellfun?

Please note that there is no need to maintain the exact current format of the involved objects (i.e. being cell arrays). So any fast solution that would compute the same result is perfectly fine.

Upvotes: 2

Views: 377

Answers (4)

Code42
Code42

Reputation: 2482

I wrote a script to do pairwise intersection.

You will need to convert the cells to rectangle tables first and then do MetaIntersect(A,B)

Upvotes: 0

Jonathan H
Jonathan H

Reputation: 7953

Building upon Matthew Gunn solution which I find elegant, in the case of bounded non-negative integers, where the upper bound is relatively large still, and when the cells are very large.

We can basically apply this to each pair of cell elements, while preallocating a long row of logicals which can be reused for each pair, taking care about cleaning it up each time. This would look like:

function C1_C2 = fast_intersect( C1, C2, max_int )

    x = false(1,max_int);
    y = false(1,max_int);

    C1_C2 = cellfun( @fast_intersect_impl, C1, C2, 'UniformOutput', false );


    function ab = fast_intersect_impl(a,b)

        x(a)=true; y(b)=true;
        ab = find(x & y);
        x(a)=false; y(b)=false;

    end

end

Upvotes: 2

Matthew Gunn
Matthew Gunn

Reputation: 4519

If the numbers in your sets are integers indexed from 1 to some not so huge element, you could represent the sets in a matrix using columns as indicators. (i.e. X(i,j) is a binary indicator as to whether element j is in set i.) Load up the matrices with:

X = false(length(tri1), MAXELEMENT);
for i=1:length(tri1), 
  X(i,tri1{i}') = true; 
end
Y = false(length(tri2), MAXELEMENT);
for i=1:length(tri2), 
  Y(i,tri2{i}') = true; 
end

Then the intersection is just myintersection = X & Y; which should be super fast.

Upvotes: 1

Jonathan H
Jonathan H

Reputation: 7953

I don't think there is any improvement to your solution. However, are you sure you didn't mean to write

tri12 = cellfun( @(t1, t2) intersect(t1, t2), tri1, tri2, 'UniformOutput', false );

instead? That is, returning an array, and not a cell array for each couple? This could speed things up a little.

intersect uses unique under the hood for numeric arrays, which is as good as it gets without further constraints. However, if you know the values in your arrays are all positive integers bounded by a relatively small value (say under tens of millions), you might be able to cook an ad hoc solution in C/Mex using bin counts. The Matlab equivalent would be using accumarray, but I doubt that would be faster than unique.

Upvotes: 0

Related Questions