EelkeSpaak
EelkeSpaak

Reputation: 2825

Algorithm for deciding which elements N and M with N:M relation to keep, which maximize resulting number of pairs

I have a bunch of elements of two classes N and M related to one another in a many-to-many fashion. Let's call them 'restaurants' and 'dishes'. I want to make a selection of restaurants and dishes that results in the largest possible number of restaurant-dish pairs. However every restaurant must serve every possible dish that I retain, and every dish must be available in every restaurant. How do I go about finding which restaurants and dishes I should kick out?

One possible (far from optimal) "solution" would be to only keep those restaurants serving all dishes (resulting in very few restaurants being retained), or those dishes served in all restaurants (resulting in very few dishes being retained). But that is not what I am after, as I want to determine the optimal way to make such a selection, resulting in a fair balance of restaurants and dishes being retained.

I am working in Python; in case it helps anyone, here some simple code that generates the type of many-to-many data I'm dealing with:

import numpy as np

# let a dish be identified by an integer between 0 and 100
dishes = np.arange(100)

restaurants = []
for k in range(30):
    # a restaurant will have multiple dishes
    # the amount of dishes will also vary per restaurant
    # e.g. between 10 and 100 here
    restaurants.append(
        np.random.choice(dishes,
            size=np.random.randint(10, 100),
            replace=False)
    )

Upvotes: 1

Views: 100

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726919

First, observe that the Restaurants-to-Dishes mapping is a bipartite graph. Next, observe that any solution to "all restaurants serve every dish, and all dishes are served by every restaurant" problem is a complete bipartite subgraph, or a biclique. Finally, since you are looking to maximize the number of dish-restaurant pairs (as opposed to maximizing the number of included dishes or the number of participating restaurants) the biclique that you are trying to find is called edge maximum (the other maximum biclique is called vertex maximum).

The problem, therefore, can be formally restated as follows:

Find an edge maximum biclique in a bipartite graph.

This problem has a fast algorithm described in this paper:

Yun Zhang, Elissa J. Chesler and Michael A. Langston

On Finding Bicliques in Bipartite Graphs: a Novel Algorithm with Application to the Integration of Diverse Biological Data Types

The paper has pseudocode on page 4.

Upvotes: 1

Related Questions