Multiprocessing pool function appears quicker but unix time says otherwise

Question

I wanted to speed up my function within a class called translate_dirac_delta. I have used multiprocessing to fill to the array with a shared array according to this demo https://jonasteuwen.github.io/numpy/python/multiprocessing/2017/01/07/multiprocessing-numpy-array.html. I calculated t1-t0 for the call to the function which appeared to be twice the speed with 4 cores. However, when I used unix time function it's actually twice as slow. I know there will be some overheard using multiprocessing but I didn't expect it to be quite so much. The module I'm using ssht is a cython wrapper which isn't public so can't do a full MWE.

Timing/calling function

import pyssht as ssht # cython wrapper

def translation(self, flm, pix_i, pix_j):
    t0 = time.time()
    glm = self.translate_dirac_delta(flm, pix_i, pix_j)
    t1 = time.time()
    print(t1 - t0)

    return glm

def calc_pixel_value(self, ind, pix_i, pix_j):
    # create Ylm corresponding to index
    ylm_harmonic = np.zeros((self.L * self.L), dtype=complex)
    ylm_harmonic[ind] = 1

    # convert Ylm from pixel to harmonic space
    ylm_pixel = ssht.inverse(ylm_harmonic, self.L, Method=self.method)

    # get value at pixel (i, j)
    ylm_omega = np.conj(ylm_pixel[pix_i, pix_j])

    return ylm_omega

Original

t1 - t0 = 16.4s
real 0m16.8
user 0m22.2s

sys 0m1.5s

def translate_dirac_delta(self, flm, pix_i, pix_j):
    flm_trans = self.complex_translation(flm)

    return flm_trans

def complex_translation(self, flm):
    for ell in range(self.L):
        for m in range(-ell, ell + 1):
            ind = ssht.elm2ind(ell, m)
            conj_pixel_val = self.calc_pixel_value(ind)
            flm[ind] = conj_pixel_val
    return flm

Parallel

t1 - t0 = 8.0s
real 0m19.5
user 0m31.9s

sys 0m1.5s

def translate_dirac_delta(self, flm, pix_i, pix_j):
    # create arrays to store final and intermediate steps
    result_r = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
    result_i = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
    shared_array_r = multiprocessing.sharedctypes.RawArray(
                     result_r._type_, result_r)
    shared_array_i = multiprocessing.sharedctypes.RawArray(
                     result_i._type_, result_i)

    # ensure function declared before multiprocessing pool
    global complex_func

    def complex_func(ell):
        # store real and imag parts separately
        tmp_r = np.ctypeslib.as_array(shared_array_r)
        tmp_i = np.ctypeslib.as_array(shared_array_i)
        # perform translation
        for m in range(-ell, ell + 1):
            ind = ssht.elm2ind(ell, m)
            conj_pixel_val = self.calc_pixel_value(
                ind, pix_i, pix_j)
            tmp_r[ind] = conj_pixel_val.real
            tmp_i[ind] = conj_pixel_val.imag

    # initialise pool and apply function
    with multiprocessing.Pool() as p:
        p.map(complex_func, range(self.L))

    # retrieve real and imag components
    result_r = np.ctypeslib.as_array(shared_array_r)
    result_i = np.ctypeslib.as_array(shared_array_i)

    # combine results
    return result_r + 1j * result_i

Dunes · Accepted Answer

For a given process, user and sys time time is the cumulative time spent by a process and its children executing program code and kernel calls respectively. The time function returns wall time (real time), which is more like a stop clock, enabling you to measure time elapsed between one moment and the next.

It is no surprise your multi processing solution takes up more user time than your original solution as more time is spent copying data between the parent and child processes. However, overall your job still completes in a smaller amount of real time.

https://en.wikipedia.org/wiki/Time_%28Unix%29

Multiprocessing pool function appears quicker but unix time says otherwise

Answers (1)

Related Questions