daveruinseverything
daveruinseverything

Reputation: 5167

Find most efficient groups of pairs

Problem

I have a group of people, and I want each person to have a 1:1 meeting with every other person in the group. A given person can only meet with one other person at a time, so I want to do the following:

  1. Find every possible pairing combination
  2. Group pairs together into "rounds" of meetings, where each person can only be in a round once, and where a round should contain as many pairs as possible to satisfy all possible pairing combinations in the smallest number of rounds.

To demonstrate the problem in terms of desired input/output, let's say I have the following list:

>>> people = ['Dave', 'Mary', 'Susan', 'John']

I want to produce the following output:

>>> for round in make_rounds(people):
>>>     print(round)
[('Dave', 'Mary'), ('Susan', 'John')]
[('Dave', 'Susan'), ('Mary', 'John')]
[('Dave', 'John'), ('Mary', 'Susan')]

If I had an odd number of people, then I would expect this result:

>>> people = ['Dave', 'Mary', 'Susan']
>>> for round in make_rounds(people):
>>>     print(round)
[('Dave', 'Mary')]
[('Dave', 'Susan')]
[('Mary', 'Susan')]

The key to this problem is that I need my solution to be performant (within reason). I've written code that works, but as the size of people grows it becomes exponentially slow. I don't know enough about writing performant algorithms to know whether my code is inefficient, or whether I'm simply bound by the parameters of the problem

What I've tried

Step 1 is easy: I can get all possible pairings using itertools.combinations:

>>> from itertools import combinations
>>> people_pairs = set(combinations(people, 2))
>>> print(people_pairs)
{('Dave', 'Mary'), ('Dave', 'Susan'), ('Dave', 'John'), ('Mary', 'Susan'), ('Mary', 'John'), ('Susan', 'John')}

To work out the rounds themselves, I'm building a round like so:

  1. Create an empty round list
  2. Iterate over a copy of the people_pairs set calculated using the combinations method above
  3. For each person in the pair, check if there are any existing pairings inside the current round that already contain that individual
  4. If there's already a pair that contains one of the individuals, skip that pairing for this round. If not, add the pair to the round, and remove the pair from the people_pairs list.
  5. Once all the people pairs have been iterated over, append the round to a master rounds list
  6. Start again, since people_pairs now contains only the pairs that didn't make it into the first round

Eventually this produces the desired result, and whittles down my people pairs until there are none left and all the rounds are calculated. I can already see that this is requiring a ridiculous number of iterations, but I don't know a better way of doing this.

Here's my code:

from itertools import combinations

# test if person already exists in any pairing inside a round of pairs
def person_in_round(person, round):
    is_in_round = any(person in pair for pair in round)
    return is_in_round

def make_rounds(people):
    people_pairs = set(combinations(people, 2))
    # we will remove pairings from people_pairs whilst we build rounds, so loop as long as people_pairs is not empty
    while people_pairs:
        round = []
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if not person_in_round(pair[0], round) and not person_in_round(pair[1], round):
                round.append(pair)
                people_pairs.remove(pair)
        yield round

Plotting out the performance of this method for list sizes of 100-300 using https://mycurvefit.com shows that calculating rounds for a list of 1000 people would probably take around 100 minutes. Is there a more efficient way of doing this?

Note: I'm not actually trying to organise a meeting of 1000 people :) this is just a simple example that represents the matching / combinatorics problem I'm trying to solve.

Upvotes: 25

Views: 2379

Answers (7)

bestaim
bestaim

Reputation: 1

Don’t make a copy of the set every time through the list. That’s a big waste of time/memory. Instead, modify the set once after each iteration.

Keep a separate set of people in each round. Looking up a person in a set will be an order of magnitude faster than looping through the entire round.

Upvotes: 0

Steven Rumbalski
Steven Rumbalski

Reputation: 45542

This is an implementation of the algorithm described in the Wikipedia article Round-robin tournament.

from itertools import cycle , islice, chain

def round_robin(iterable):
    items = list(iterable)
    if len(items) % 2 != 0:
        items.append(None)
    fixed = items[:1]
    cyclers = cycle(items[1:])
    rounds = len(items) - 1
    npairs = len(items) // 2
    return [
        list(zip(
            chain(fixed, islice(cyclers, npairs-1)),
            reversed(list(islice(cyclers, npairs)))
        ))
        for _ in range(rounds)
        for _ in [next(cyclers)]
    ]

Upvotes: 17

soundstripe
soundstripe

Reputation: 1474

Two things you can do right off the bat:

  1. Don’t make a copy of the set every time through the list. That’s a big waste of time/memory. Instead, modify the set once after each iteration.

  2. Keep a separate set of people in each round. Looking up a person in a set will be an order of magnitude faster than looping through the entire round.

Ex:

def make_rounds(people):
    people_pairs = set(combinations(people, 2))

    while people_pairs:
        round = set()
        people_covered = set()
        for pair in people_pairs:
            if pair[0] not in people_covered \
               and pair[1] not in people_covered:
                round.add(pair)
                people_covered.update(pair)
        people_pairs -= round # remove thi
        yield round

Comparison: time comparison

Upvotes: 3

lenik
lenik

Reputation: 23508

I generate just the indices (because I have trouble coming up with 1000 names =), but for 1000 numbers the runtime is about 4 seconds.

The main problem of all other approaches -- they use pairs and work with them, there are plenty of pairs, and the run time is getting much longer. My approach differs in working with people, not pairs. I have a dict() that maps the person to the list of other persons (s)he has to meet, and these lists are at most N items long (not N^2, as with pairs). Hence the time savings.

#!/usr/bin/env python

from itertools import combinations
from collections import defaultdict

pairs = combinations( range(6), 2 )

pdict = defaultdict(list)
for p in pairs :
    pdict[p[0]].append( p[1] )

while len(pdict) :
    busy = set()
    print '-----'
    for p0 in pdict :
        if p0 in busy : continue

        for p1 in pdict[p0] :
            if p1 in busy : continue

            pdict[p0].remove( p1 )
            busy.add(p0)
            busy.add(p1)
            print (p0, p1)

            break

    # remove empty entries
    pdict = { k : v for k,v in pdict.items() if len(v) > 0 }

'''
output:
-----
(0, 1)
(2, 3)
(4, 5)
-----
(0, 2)
(1, 3)
-----
(0, 3)
(1, 2)
-----
(0, 4)
(1, 5)
-----
(0, 5)
(1, 4)
-----
(2, 4)
(3, 5)
-----
(2, 5)
(3, 4)
'''

Upvotes: 6

Mike Housky
Mike Housky

Reputation: 4069

Maybe I'm missing something (not entirely uncommon) but this sounds like a plain old round-robin tournament, where each team plays every other team exactly once.

There are O(n^2) methods to handle this "by hand", that work "by machine" just fine. One good description can be found in the Wikipedia article on Round-Robin Tournaments.

About that O(n^2): There will be either n-1 or n rounds, each requiring O(n) steps to rotate all but one table entry and O(n) steps to enumerate the n//2 matches in each round. You can make the rotation O(1) by using doubly-linked lists, but the enumeration of the matches is still O(n). So O(n)*O(n) = O(n^2).

Upvotes: 2

Zev
Zev

Reputation: 3491

When you need fast lookups hashes/dicts are the way to go. Keep track of whose been in each round in a dict rather than a list and it will be much faster.

Since you are getting in algorithms, studying big O notation will help you out and knowing what data structures are good at what kind of operations is also key. See this guide: https://wiki.python.org/moin/TimeComplexity for time complexity of Python built-ins. You'll see that checking for an item in a list is O(n) meaning that it scales linearly with the size of your inputs. So since it is in a loop you end up with O(n^2) or worse. For dicts, lookups are generally O(1) meaning that it doesn't matter the size of your input.

Also, don't override built-ins. I'd changed round to round_

from itertools import combinations

# test if person already exists in any pairing inside a round of pairs
def person_in_round(person, people_dict):
    return people_dict.get(person, False)

def make_rounds(people):
    people_pairs = set(combinations(people, 2))
    people_in_round = {}
    # we will remove pairings from people_pairs whilst we build rounds, so loop as long as people_pairs is not empty
    while people_pairs:
        round_ = []
        people_dict = {}
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if not person_in_round(pair[0], people_dict) and not person_in_round(pair[1], people_dict):
                round_.append(pair)
                people_dict[pair[0]] = True
                people_dict[pair[1]] = True


                people_pairs.remove(pair)
        yield round_

Upvotes: 3

Sam Kowalski
Sam Kowalski

Reputation: 31

This takes about 45s on my computer

def make_rnds(people):
    people_pairs = set(combinations(people, 2))
    # we will remove pairings from people_pairs whilst we build rnds, so loop as long as people_pairs is not empty
    while people_pairs:
        rnd = []
        rnd_set = set()
        peeps = set(people)
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if pair[0] not in rnd_set and pair[1] not in rnd_set:
                rnd_set.update(pair)
                rnd.append(pair)

                peeps.remove(pair[0])
                peeps.remove(pair[1])

                people_pairs.remove(pair)
                if not peeps:
                    break
        yield rnd

I dropped the function person_in_rnd to reduce time lost to function calls, and added a variable called rnd_set and peeps. rnd_set is a set of everyone in the round so far and is used to check for matches against the pair. peeps is a copied set of people, every time we add a pair to rnd we remove that those individuals from peeps. This lets us stop iterating through all combinations once peeps is empty, that is, once everyone is in a round.

Upvotes: 1

Related Questions