Dividing a Python list into lists containing only unique values

Question

I am building a scheduling tool, which has to make a schedule with the numbers 0-16. It must adhere to the following rules:

The schedule is size (10,7)
No one number may occur twice in a column
All numbers must occur the same amount of times (if the schedule size is not divisible by the numbers to choose from, some numbers can occur once more)

To make sure I can adhere to the third rule, I created a pool in which each number occurs with the same frequency, while the leftover is divided through random choice.

schedule = np.zeros((10,7))
IDs = [i for i in range(17)]

numberOfTasks = schedule.size

rounds = math.floor(numberOfTasks / len(IDs))
leftover = numberOfTasks % len(IDs)

pool = [person for person in IDs for _ in range(rounds)]
pool += random.sample(IDs, k=leftover)

This works, I now have a list of IDs of the same size as the schedule, all I need to do is put them in. Now, to adhere to the second rule, I should pick each number only once in each day/column. I tried this:

for i in range(schedule.shape[1]):
    
    daySchedule = schedule[:, i]
    plannablePeople = list(np.unique(pool))

    for j in range(len(daySchedule)):

        pickedPerson = random.choice(plannablePeople)
        plannablePeople.remove(pickedPerson)
        pool.remove(pickedPerson)
        daySchedule[j] = pickedPerson
    
    # Checking the scheduled day and the pool
    print(daySchedule)
    print(pool)

    schedule[:, i] = daySchedule

However, with this method I end up with an error in the last column because some ID's are left multiple times and therefore the plannablePeople list will be too short. Does anyone know if there is a more efficient way of solving this?

I thought there should be a way to split a list into lists with only unique items, but I yet have to find out how.

L.Grozinger · Accepted Answer

It sounds like you want to generate random samples from your IDs, without replacement, for each day in your schedule.

To do this you can use numpy.random.choice. You will see from the docs that it takes a keyword argument, size, which is the number of samples to take, and another keyword argument, replace, whose default value is True.

So something like:

numpy.random.choice(IDs, size=numberOfTasks, replace=False)

will generate one day's worth of scheduling for you.

A more complete, but simple example is as follows:

import numpy
import itertools

ndays = 7
njobs = 10
people = range(17)

days = [numpy.random.choice(people, size=njobs, replace=False) for d in range(ndays)]
schedule = numpy.array(days)

which gives the example schedule:

array([[11, 12, 14,  2,  0,  3, 10,  1,  6, 13],
       [ 8, 15,  7,  0, 12,  3,  1,  6, 10, 13],
       [ 2,  9, 16,  4,  5, 15,  0,  8,  7, 11],
       [ 1,  4, 10, 16,  6, 12,  2, 15, 13,  9],
       [ 8,  1,  7, 13, 12,  0,  3, 15,  4,  9],
       [ 2,  5,  7,  3,  9, 10, 13, 15,  0,  8],
       [ 7, 13, 14,  6,  8, 16,  3, 11,  1,  9]])

A 'fair' schedule

Your requirement for some sort of fairness is more difficult to enforce. I'm not totally convinced that your strategy of using a kind of worker pool works in general, though it may work reasonably well most of the time. Here is a short example which uses a pool. (Note that the extra work of finding the remainder extra and supplementing the pool with randomly sampled workers is not IMO necessary, since you'll be randomly sampling from the pool anyway)

import numpy
import itertools

ndays = 7
njobs = 10
people = [p for p in range(17)]
pool = people * ndays

schedule = numpy.zeros((7, 10), numpy.int)

for day in range(ndays):
    schedule[day, :] = numpy.random.choice(numpy.unique(pool), size=njobs, replace=False)
    for person in schedule[day, :]:
        pool.remove(person)

which gives the example schedule:

array([[12, 13,  0,  1,  2, 11,  6,  8, 16,  9],
       [15,  8,  3, 10,  5, 12,  7,  0, 11,  4],
       [12,  7, 13,  4,  0,  3, 15,  9, 14, 10],
       [14,  6, 16,  9,  4, 15, 11,  5, 10,  3],
       [ 0, 13,  6,  1, 12,  5, 15,  4,  7,  9],
       [13, 15, 16,  3,  5,  2,  8,  4,  6,  7],
       [ 2,  3, 15,  5,  4, 10,  0,  8,  9,  1]])

(you can get a (10, 7) shape schedule with schedule.T)

With regard to your original example, the line pool.remove(pickedPerson) looks suspicious to me, and is more likely intended as plannablePeople.remove(pickedPerson). There is also a small mistake elsewhere, such as the indexing in daySchedule[i] = pickedPerson which probably should be daySchedule[j] = pickedPerson. After correcting these, the example code in your question works well for me.

Notice also that your problem almost identical to the problem of generating Latin Squares (actually Latin Rectangle in your case, which you could obtain from any Latin Square large enough), although generating a single Latin Square is easy enough, random sampling uniformly (i.e. fairly) from all Latin Squares is so hard that it is NP-complete AFAIK. This hints (though its certainly not a proof) that it might be very hard to enforce the fairness requirements in your problem too.

Dividing a Python list into lists containing only unique values

Answers (1)

Related Questions