tabdulla
tabdulla

Reputation: 525

Efficient Shift Scheduling in Python

I'm currently working on doing some shift scheduling simulations for a model taxicab company. The company operates 350 cabs, and all are in use on any given day. Drivers each work 5 shifts of 12 hours each, and the there are four overlapping shifts a day. There are shifts from 3:00-15:00, 15:00-3:00, 16:00-4:00, and 4:00-16:00. I developed it in Python originally, because of the need to rapidly develop it, and I thought that the performance would be acceptable. The original parameters only required two shifts a day (3:00-15:00, and 15:00-3:00), and while performance was not great, it was good enough for my uses. It could make a weekly schedule for the drivers in about 8 minutes, using a simple brute force algorithm (evaluates all potential swaps to see if the situation can be improved.)

With the four overlapping shifts, performance is absolutely abysmal. It takes a little over an hour to do a weekly schedule. I've done some profiling using cProfile, and it looks like the main culprits are two methods. One is a method to determine if there is a conflict when placing a driver in a shift. It makes sure that they are not serving in a shift on the same day, or serving in the preceding or following shifts. With only two shifts a day, this was easy. One simply had to determine if the driver was already scheduled to work in the shift directly before or after. With the four overlapping shifts, this has become more complicated. The second culprit is the method which determines whether the shift is a day or night shift. Again, with the original two shifts, this was as easy as determining if the shift number was even or odd, with shift numbers beginning at 0. The first shift (shift 0) was designated as a night shift, the next was day, and so on and so forth. Now the first two are night, the next two are, etc. These methods call each other, so I will put their bodies below.

def conflict_exists(shift, driver, shift_data):
    next_type = get_stype((shift+1) % 28)
    my_type = get_stype(shift)

    nudge = abs(next_type - my_type)

    if driver in shift_data[shift-2-nudge] or driver in shift_data[shift-1-nudge] or driver in shift_data[(shift+1-(nudge*2)) % 28] or driver in shift_data[(shift+2-nudge) % 28] or driver in shift_data[(shift+3-nudge) % 28]:
        return True
    else:
        return False

Note that get_stype returns the type of the shift, with 0 indicating it is a night shift and 1 indicating it a day shift.

In order to determine the shift type, I'm using this method:

def get_stype(k):
    if (k / 4.0) % 1.0 < 0.5:
        return 0
    else:
        return 1

And here's the relevant output from cProfile:

     ncalls  tottime  percall  cumtime  percall
     57662556   19.717    0.000   19.717    0.000 sim8.py:241(get_stype)
     28065503   55.650    0.000   77.591    0.000 sim8.py:247(in_conflict)

Does anyone have any sagely advice or tips on how I might go about improving the performance of this script? Any help would be greatly appreciated!

Cheers,

Tim

EDIT: Sorry, I should have clarified that the data from each shift is stored as a set i.e. shift_data[k] is of the set data type.

EDIT 2:

Adding main loop, as per request below, along with other methods called. It's a bit of a mess, and I apologize for that.

def optimize_schedule(shift_data, driver_shifts, recheck):
    skip = set()

    if len(recheck) == 0:
        first_run = True
        recheck = []
        for i in xrange(28):
            recheck.append(set())
    else:
        first_run = False

    for i in xrange(28):

        if (first_run):
            targets = shift_data[i]
        else:
            targets = recheck[i]

        for j in targets:
            o_score = eval_score = opt_eval_at_coord(shift_data, driver_shifts, i, j)

            my_type = get_stype(i)
            s_type_fwd = get_stype((i+1) % 28)

            if (my_type == s_type_fwd):
                search_direction = (i + 2) % 28
                end_direction = i
            else:
                search_direction = (i + 1) % 28 
                end_direction = (i - 1) % 28 

            while True:
                if (end_direction == search_direction):
                    break
                for k in shift_data[search_direction]:

                    coord = search_direction * 10000 + k 

                    if coord in skip:
                        continue

                    if k in shift_data[i] or j in shift_data[search_direction]:
                        continue

                    if in_conflict(search_direction, j, shift_data) or in_conflict(i, k, shift_data):
                        continue

                    node_a_prev_score = o_score
                    node_b_prev_score = opt_eval_at_coord(shift_data, driver_shifts, search_direction, k)

                    if (node_a_prev_score == 1) and (node_b_prev_score == 1):
                        continue

                    a_type = my_type
                    b_type = get_stype(search_direction)

                    if (node_a_prev_score == 1):
                        if (driver_shifts[j]['type'] == 'any') and (a_type != b_type):
                            test_eval = 2
                        else:
                            continue
                    elif (node_b_prev_score == 1):
                        if (driver_shifts[k]['type'] == 'any') and (a_type != b_type):
                            test_eval = 2
                        else:
                            test_eval = 0
                    else:
                        if (a_type == b_type):
                            test_eval = 0
                        else:
                            test_eval = 2

                    print 'eval_score: %f' % test_eval

                    if (test_eval > eval_score):

                        cand_coords = [search_direction, k]
                        eval_score = test_eval
                        if (test_eval == 2.0):
                            break
                else:
                    search_direction = (search_direction + 1) % 28
                    continue

                break

            if (eval_score > o_score):
                print 'doing a swap: ',
                print cand_coords,

                shift_data[i].remove(j)
                shift_data[i].add(cand_coords[1])

                shift_data[cand_coords[0]].add(j)   
                shift_data[cand_coords[0]].remove(cand_coords[1])

                if j in recheck[i]:
                    recheck[i].remove(j)

                if cand_coords[1] in recheck[cand_coords[0]]:               
                    recheck[cand_coords[0]].remove(cand_coords[1])

                recheck[cand_coords[0]].add(j)
                recheck[i].add(cand_coords[1])

            else:
                coord = i * 10000 + j
                skip.add(coord)

    if first_run:
        shift_data = optimize_schedule(shift_data, driver_shifts, recheck)

    return shift_data



def opt_eval_at_coord(shift_data, driver_shifts, i, j):
    node = j
    if in_conflict(i, node, shift_data):
        return float('-inf')
    else:
        s_type = get_stype(i)

        d_pref = driver_shifts[node]['type']

        if (s_type == 0 and d_pref == 'night') or (s_type == 1 and d_pref == 'day') or (d_pref == 'any'):
            return 1
        else:
            return 0

Upvotes: 9

Views: 8927

Answers (4)

Winston Ewert
Winston Ewert

Reputation: 45039

It seems to me that swapping between the two day-shifts or between the two night-shifts will never help. It won't change the how well the drivers like the shifts and it won't change how those shift conflict with other shifts.

So I think you should be able to only plan two shifts initially, day and night, and only afterwards split the drivers assigned into the shifts into the two actual shifts.

Upvotes: 2

John Y
John Y

Reputation: 14529

Hmm. Interesting and fun-looking problem. I will have to look at it more. For now, I have this to offer: Why are you introducing floats? I would do get_stype() as follows:

def get_stype(k):
    if k % 4 < 2:
        return 0
    return 1

It's not a massive speedup, but it's quicker (and simpler). Also, you don't have to do the mod 28 whenever you're feeding get_stype, because that is already taken care of by the mod 4 in get_stype.

If there are significant improvements to be had, they will come in the form of a better algorithm. (I'm not saying that your algorithm is bad, or that there is any better one. I haven't really spent enough time looking at it. But if there isn't a better algorithm to be found, then further significant speed increases will have to come from using PyPy, Cython, Shed Skin, or rewriting in a different (faster) language altogether.)

Upvotes: 3

jterrace
jterrace

Reputation: 67073

I don't think your problem is the time it takes to run those two functions. Notice that the percall value for the functions are 0.000. This means that each time the function is invoked, it takes less than 1 millisecond.

I think your problem is the number of times the functions are called. A function call in python is expensive. For example, calling a function that does nothing 57,662,556 times takes 7.15 seconds on my machine:

>>> from timeit import Timer
>>> t = Timer("t()", setup="def t(): pass")
>>> t.timeit(57662556)
7.159075975418091

One thing I'd be curious about is the shift_data variable. Are the values lists or dicts?

driver in shift_data[shift-2-nudge]

The in will take O(N) time if it's a list but O(1) time if it's a dict.

EDIT: Since shift_data values are sets, that should be fine

Upvotes: 2

Thomas K
Thomas K

Reputation: 40340

There's nothing that would obviously slow these functions down, and indeed they aren't slow. They just get called a lot. You say you're using a brute force algorithm - can you write an algorithm that doesn't try every possible combination? Or is there a more efficient way of doing it, like storing the data by driver rather than by shift?

Of course, if you need instant speedups, it might benefit from running in an interpreter like PyPy, or using Cython to convert critical parts to C.

Upvotes: 4

Related Questions