How to find the maximum number of matching data?

Question

Given a bidimensionnal array such as:

 -----------------------
|   | 1 | 2 | 3 | 4 | 5 |
|-------------------|---|
| 1 | X | X | O | O | X |
|-------------------|---|
| 2 | O | O | O | X | X |        
|-------------------|---|
| 3 | X | X | O | X | X |       
|-------------------|---|
| 4 | X | X | O | X | X |       
 -----------------------

I have to find the largest set of cells currently containing O with a maximum of one cell per row and one per column.

For instance, in the previous example, the optimal answer is 3, when:

row 1 goes with column 4;
row 2 goes with column 1 (or 2);
row 3 (or 4) goes with column 3.

It seems that I have to find an algorithm in O(CR) (where C is the number of columns and R the number of rows).

My first idea was to sort the rows in ascending order according to its number on son. Here is how the algorithm would look like:

For i From 0 To R
    For j From 0 To N
        If compatible(i, j)
            add(a[j], i)

Sort a according to a[j].size

result = 0

For i From 0 To N
    For j From 0 to a[i].size
         if used[a[i][j]] = false
             used[a[i][j]] = true
             result = result + 1
             break

Print result

Altough I didn't find any counterexample, I don't know whether it always gives the optimal answer.

Is this algorithm correct? Is there any better solution?

qwwqwwq · Accepted Answer

Going off Billiska's suggestion, I found a nice implementation of the "Hopcroft-Karp" algorithm in Python here:

http://code.activestate.com/recipes/123641-hopcroft-karp-bipartite-matching/

This algorithm is one of several that solves the maximum bipartite matching problem, using that code exactly "as-is" here's how I solved example problem in your post (in Python):

from collections import defaultdict
X=0; O=1;
patterns = [ [ X , X , O , O , X ],
             [ O , O , O , X , X ],        
             [ X , X , O , X , X ],       
             [ X , X , O , X , X ]] 

G = defaultdict(list)

for i, x in enumerate(patterns):
    for j, y in enumerate(patterns):
        if( patterns[i][j] ):
            G['Row '+str(i)].append('Col '+str(j))

solution = bipartiteMatch(G) ### function defined in provided link
print len(solution[0]), solution[0]

How to find the maximum number of matching data?

Answers (1)

Related Questions