guilherme3330
guilherme3330

Reputation: 19

Cython: Speed up simple code

I am trying to speed up the following code with Cython. Two arguments are imported from python:

processing_times: List with 500 lists of 20 integers each (500x20).

sequence: List with 500 integers.

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)

cpdef taillard_acceleration(sequence, processing_times, int job_inserted, int num_machines):

    # Static arrays - number of jobs limited to 500 jobs and 20 machines
    cdef int e[501][21]
    cdef int q[501][21]
    cdef int f[501][21]
    cdef int ms[501]

    # Variables
    cdef int sequence_length, best_makespan, best_position
    cdef int i, j, iq, jq, tmp

    # Initialize some values
    sequence_length = len(sequence)
    iq = sequence_length + 1


    for i in range(1, sequence_length + 2):
        if i < sequence_length + 1:
            e[i][0] = 0

            # Q index I
            iq = iq - 1
            q[iq][num_machines + 1] = 0

        f[i][0] = 0
        jq = num_machines + 1


        for j in range(1, num_machines + 1):
            if i == 1:
                e[0][j] = 0
                q[sequence_length + 1][num_machines + 1 - j] = 0
            if i < sequence_length + 1:
                # Q Index J
                jq = jq - 1

                if e[i][j - 1] > e[i - 1][j]:
                    e[i][j] = e[i][j - 1] + processing_times[sequence[i - 1]-1][j-1]
                else:
                    e[i][j] = e[i - 1][j] + processing_times[sequence[i - 1]-1][j-1]

                if q[iq][jq + 1] > q[iq + 1][jq]:
                    q[iq][jq] = q[iq][jq + 1] + processing_times[sequence[iq - 1]-1][jq-1]
                else:
                    q[iq][jq] = q[iq + 1][jq] + processing_times[sequence[iq - 1]-1][jq-1]

            # f(ij) = max {f(i, j-1), e(i-1, j)}
            if f[i][j - 1] > e[i - 1][j]:
                f[i][j] = f[i][j - 1] + processing_times[job_inserted-1][j-1]
            else:
                f[i][j] = e[i - 1][j] + processing_times[job_inserted-1][j-1]


    # Makespam - job k in position i
    best_makespan = 0
    best_position = 0
    for i in range(1, sequence_length + 2):
        ms[i] = 0
        for j in range(1, num_machines + 1):
            tmp = f[i][j] + q[i][j]
            if tmp > ms[i]:
                ms[i] = tmp
        # Check best insertion position
        if ms[i] < best_makespan or best_makespan == 0:
            best_makespan = ms[i]
            best_position = i


    return best_position, best_makespan

I was able to speed up the process 4x times over the original python code:

Just Python: 0.04114614830813535
With Cython: 0.00937230621550278
Cython is 4.390183948543561 times faster

How can i get a better speed improvement in this code?

I have already tryed to convert sequence and processing_times to numpy arrays outside and then use Memory views, but I got no improvement in that.

cpdef taillard_acceleration(sequence_np, processing_times_np, int job_inserted, int num_machines):

    # memory view
    cdef int [:, :] processing_times = processing_times_np
    cdef int [:] sequence = sequence_np

Also should I use malloc for q,e,f,ms arrays? First time using Cython, so i dont know if im doing this right. Any help is much appreciated.

Upvotes: 1

Views: 253

Answers (1)

DavidW
DavidW

Reputation: 30890

Most of it looks like it's properly typed and so you're unlikely to get huge improvements. The main thing that isn't typed are sequence and processing_times. You should make these memoryviews:

def taillard_acceleration(int[:] sequence, int[:,:] processing_times, int job_inserted, int num_machines):

I know you already tried this, however you should also change their indexing to be of the form processing_times[i,j] (rather than processing_times[i][j]). What you're doing creates an 1D memoryview as a temporary object which is likely to be a little slower.


For the q, e, f, and ms arrays: what you're doing now is absolutely fine if you're happy recompiling to change the sizes. If you think you might want to change the sizes at runtime then you should allocate them at runtime. You could use malloc but I'd use:

cdef int[:,::1] e = np.zeros((501,21))

(the [:,::1] tells Cython that the array is 2D and contiguous). Using numpy like this would be very slightly slower than malloc, but it's also a lot easier and a lot less likely for you to get it wrong. If you do this then change their indexing to e[i,j] as described above.

(It looks like they really should be of the size sequency_length by num_machines so a runtime size might be a good idea)

Upvotes: 1

Related Questions