Reputation: 525
I'm currently working on doing some shift scheduling simulations for a model taxicab company. The company operates 350 cabs, and all are in use on any given day. Drivers each work 5 shifts of 12 hours each, and the there are four overlapping shifts a day. There are shifts from 3:00-15:00, 15:00-3:00, 16:00-4:00, and 4:00-16:00. I developed it in Python originally, because of the need to rapidly develop it, and I thought that the performance would be acceptable. The original parameters only required two shifts a day (3:00-15:00, and 15:00-3:00), and while performance was not great, it was good enough for my uses. It could make a weekly schedule for the drivers in about 8 minutes, using a simple brute force algorithm (evaluates all potential swaps to see if the situation can be improved.)
With the four overlapping shifts, performance is absolutely abysmal. It takes a little over an hour to do a weekly schedule. I've done some profiling using cProfile, and it looks like the main culprits are two methods. One is a method to determine if there is a conflict when placing a driver in a shift. It makes sure that they are not serving in a shift on the same day, or serving in the preceding or following shifts. With only two shifts a day, this was easy. One simply had to determine if the driver was already scheduled to work in the shift directly before or after. With the four overlapping shifts, this has become more complicated. The second culprit is the method which determines whether the shift is a day or night shift. Again, with the original two shifts, this was as easy as determining if the shift number was even or odd, with shift numbers beginning at 0. The first shift (shift 0) was designated as a night shift, the next was day, and so on and so forth. Now the first two are night, the next two are, etc. These methods call each other, so I will put their bodies below.
def conflict_exists(shift, driver, shift_data):
next_type = get_stype((shift+1) % 28)
my_type = get_stype(shift)
nudge = abs(next_type - my_type)
if driver in shift_data[shift-2-nudge] or driver in shift_data[shift-1-nudge] or driver in shift_data[(shift+1-(nudge*2)) % 28] or driver in shift_data[(shift+2-nudge) % 28] or driver in shift_data[(shift+3-nudge) % 28]:
return True
else:
return False
Note that get_stype returns the type of the shift, with 0 indicating it is a night shift and 1 indicating it a day shift.
In order to determine the shift type, I'm using this method:
def get_stype(k):
if (k / 4.0) % 1.0 < 0.5:
return 0
else:
return 1
And here's the relevant output from cProfile:
ncalls tottime percall cumtime percall
57662556 19.717 0.000 19.717 0.000 sim8.py:241(get_stype)
28065503 55.650 0.000 77.591 0.000 sim8.py:247(in_conflict)
Does anyone have any sagely advice or tips on how I might go about improving the performance of this script? Any help would be greatly appreciated!
Cheers,
Tim
EDIT: Sorry, I should have clarified that the data from each shift is stored as a set i.e. shift_data[k] is of the set data type.
EDIT 2:
Adding main loop, as per request below, along with other methods called. It's a bit of a mess, and I apologize for that.
def optimize_schedule(shift_data, driver_shifts, recheck):
skip = set()
if len(recheck) == 0:
first_run = True
recheck = []
for i in xrange(28):
recheck.append(set())
else:
first_run = False
for i in xrange(28):
if (first_run):
targets = shift_data[i]
else:
targets = recheck[i]
for j in targets:
o_score = eval_score = opt_eval_at_coord(shift_data, driver_shifts, i, j)
my_type = get_stype(i)
s_type_fwd = get_stype((i+1) % 28)
if (my_type == s_type_fwd):
search_direction = (i + 2) % 28
end_direction = i
else:
search_direction = (i + 1) % 28
end_direction = (i - 1) % 28
while True:
if (end_direction == search_direction):
break
for k in shift_data[search_direction]:
coord = search_direction * 10000 + k
if coord in skip:
continue
if k in shift_data[i] or j in shift_data[search_direction]:
continue
if in_conflict(search_direction, j, shift_data) or in_conflict(i, k, shift_data):
continue
node_a_prev_score = o_score
node_b_prev_score = opt_eval_at_coord(shift_data, driver_shifts, search_direction, k)
if (node_a_prev_score == 1) and (node_b_prev_score == 1):
continue
a_type = my_type
b_type = get_stype(search_direction)
if (node_a_prev_score == 1):
if (driver_shifts[j]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
continue
elif (node_b_prev_score == 1):
if (driver_shifts[k]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
test_eval = 0
else:
if (a_type == b_type):
test_eval = 0
else:
test_eval = 2
print 'eval_score: %f' % test_eval
if (test_eval > eval_score):
cand_coords = [search_direction, k]
eval_score = test_eval
if (test_eval == 2.0):
break
else:
search_direction = (search_direction + 1) % 28
continue
break
if (eval_score > o_score):
print 'doing a swap: ',
print cand_coords,
shift_data[i].remove(j)
shift_data[i].add(cand_coords[1])
shift_data[cand_coords[0]].add(j)
shift_data[cand_coords[0]].remove(cand_coords[1])
if j in recheck[i]:
recheck[i].remove(j)
if cand_coords[1] in recheck[cand_coords[0]]:
recheck[cand_coords[0]].remove(cand_coords[1])
recheck[cand_coords[0]].add(j)
recheck[i].add(cand_coords[1])
else:
coord = i * 10000 + j
skip.add(coord)
if first_run:
shift_data = optimize_schedule(shift_data, driver_shifts, recheck)
return shift_data
def opt_eval_at_coord(shift_data, driver_shifts, i, j):
node = j
if in_conflict(i, node, shift_data):
return float('-inf')
else:
s_type = get_stype(i)
d_pref = driver_shifts[node]['type']
if (s_type == 0 and d_pref == 'night') or (s_type == 1 and d_pref == 'day') or (d_pref == 'any'):
return 1
else:
return 0
Upvotes: 9
Views: 8927
Reputation: 45039
It seems to me that swapping between the two day-shifts or between the two night-shifts will never help. It won't change the how well the drivers like the shifts and it won't change how those shift conflict with other shifts.
So I think you should be able to only plan two shifts initially, day and night, and only afterwards split the drivers assigned into the shifts into the two actual shifts.
Upvotes: 2
Reputation: 14529
Hmm. Interesting and fun-looking problem. I will have to look at it more. For now, I have this to offer: Why are you introducing floats? I would do get_stype() as follows:
def get_stype(k):
if k % 4 < 2:
return 0
return 1
It's not a massive speedup, but it's quicker (and simpler). Also, you don't have to do the mod 28 whenever you're feeding get_stype
, because that is already taken care of by the mod 4 in get_stype
.
If there are significant improvements to be had, they will come in the form of a better algorithm. (I'm not saying that your algorithm is bad, or that there is any better one. I haven't really spent enough time looking at it. But if there isn't a better algorithm to be found, then further significant speed increases will have to come from using PyPy, Cython, Shed Skin, or rewriting in a different (faster) language altogether.)
Upvotes: 3
Reputation: 67073
I don't think your problem is the time it takes to run those two functions. Notice that the percall value for the functions are 0.000
. This means that each time the function is invoked, it takes less than 1 millisecond.
I think your problem is the number of times the functions are called. A function call in python is expensive. For example, calling a function that does nothing 57,662,556 times takes 7.15 seconds on my machine:
>>> from timeit import Timer
>>> t = Timer("t()", setup="def t(): pass")
>>> t.timeit(57662556)
7.159075975418091
One thing I'd be curious about is the shift_data
variable. Are the values lists or dicts?
driver in shift_data[shift-2-nudge]
The in
will take O(N)
time if it's a list but O(1)
time if it's a dict.
EDIT: Since shift_data values are sets, that should be fine
Upvotes: 2
Reputation: 40340
There's nothing that would obviously slow these functions down, and indeed they aren't slow. They just get called a lot. You say you're using a brute force algorithm - can you write an algorithm that doesn't try every possible combination? Or is there a more efficient way of doing it, like storing the data by driver rather than by shift?
Of course, if you need instant speedups, it might benefit from running in an interpreter like PyPy, or using Cython to convert critical parts to C.
Upvotes: 4