M.G.
M.G.

Reputation: 293

Finding groups of k disjoint rows in a matrix

Problem statement: Let S be a set of integers, e.g. S = {1,2,...,12}. M is an integer matrix whose rows can be viewed as subsets of S, whose length divides the number of elements of S, in particular every row of M contains only distinct elements/integers. I am trying to produce Matlab code that can identify all groups of disjoint rows of M whose set-theoretic union gives S(i.e. a partition of S in subsets with a fixed size) and return each such group as a matrix.

Example: A = [1 2 5 6; 3 4 11 12; 9 10 7 8] is a partition of S, whereas B = [1 2 3 4; 5 6 7 8; 9 10 11 12; 1 5 9 12] is not a partition of S (since the last row is not disjoint with the previous 3).

Order of magnitude: Typically, M would have ~ 500 000 rows and S would have up to 100 elements.

Approach so far: let m = size(M,1), n = size(M,2) (size of the partitioning subsets), s = |S| (size of the set to be partitioned) and k = s/n (number of necessary disjoint rows to form a partition - you can assume s = 0 mod n, so the problem is well-defined). Note that, in order to establish such a partition, it suffices to check for disjointness of the rows and that there are exactly k many of them.

For j = 1:(m-k+1), I observe ind = (sum(ismember(M((j+1):m,:),M(j,:)),2)==0), which gives me the column of indices of the rows after M(j,:) that are also disjoint with it. Then I create 2 x n matrices by combining M(j,:) with each of those rows. After that I'd like to repeat the ismember()routine with all these new 2 x n matrices and keep repeating until I get k x nmatrices. Which is probably all nice and dandy, but it's also where I stumble, because, for one, the number of for-routines depends on k.

Questions:

(Q1) How can I complete the code for my approach ?

(Q2) Are there better approaches to this problem (i.e. faster, vectorized/less for loops, GPU-assisted) and if yes, what are they?

Upvotes: 4

Views: 929

Answers (2)

nicolas
nicolas

Reputation: 3280

Is this what you want? I assumed that the order of the rows in M matters (i.e. permutting 2 rows is another valid M). This example gives you all the matrixes with 2 columns and 3 rows, where elements are all distinct and in {1,2,3,4,5,6}. However rows are a subset of {1,2,3,4,5,6} where the order does NOT matter. Not sure if this what you meant. If the order matters, we can add perms to get all possible permutations for each row returned by nchoosek.

I had to use a recursive function, so you would have to put it in a file:

function test()
    s = 1:6;
    n = 2;

    M_all = ProcessOneRow(s, n, {}, {});

    disp('All possilbe M:')
    M_all
end

function M_all = ProcessOneRow(s, n, cRows, M_all)

    %// find all possible rows for this level
    possibleRows = nchoosek(1:length(s), n);

    for t=1:size(possibleRows, 1)
        %// keep track of all rows so far
        cRows(end+1) = s(possibleRows(t,:));

        %// get elements remaining (remove this row)
        s_sub = s;
        s_sub(possibleRows(t,:)) = [];

        if isempty(s_sub)
            %// We found a new complete matrix M
            M = [];
            for p = 1:numel(cRows)
                M(p,:) = cRows{p};
            end
            M_all{end+1} = M;
        else
            %// Still not a complete M, so we process the rest
            M_all = ProcessOneRow(s_sub, n, cRows, M_all);
        end
        cRows(end) = [];
    end
end

Upvotes: 0

David Eisenstat
David Eisenstat

Reputation: 65458

This problem is a not-very-special case of exact cover. If you were working in C, I would suggest Knuth's Algorithm X, implemented with bit arrays instead of dancing links due to the apparent density of the instances with which you are concerned. I expect that MATLAB can accommodate you nearly as well.

As an integer, the set {1, 2, 5, 6} can be represented as 2**1 + 2**2 + 2**5 + 2**6. Then the intersection of two sets is represented as the bitwise and of their representations, and the union of two sets is the bitwise or. The empty set is 0. Unfortunately, for S = 100, you'll need to use two or more integers, which complicates life. After you take a set, you can detect intersections with other sets via a vectorized bitand.

Upvotes: 2

Related Questions