Reputation: 145
I wanted to speed up my function within a class called translate_dirac_delta. I have used multiprocessing to fill to the array with a shared array according to this demo https://jonasteuwen.github.io/numpy/python/multiprocessing/2017/01/07/multiprocessing-numpy-array.html. I calculated t1-t0 for the call to the function which appeared to be twice the speed with 4 cores. However, when I used unix time function it's actually twice as slow. I know there will be some overheard using multiprocessing but I didn't expect it to be quite so much. The module I'm using ssht is a cython wrapper which isn't public so can't do a full MWE.
Timing/calling function
import pyssht as ssht # cython wrapper
def translation(self, flm, pix_i, pix_j):
t0 = time.time()
glm = self.translate_dirac_delta(flm, pix_i, pix_j)
t1 = time.time()
print(t1 - t0)
return glm
def calc_pixel_value(self, ind, pix_i, pix_j):
# create Ylm corresponding to index
ylm_harmonic = np.zeros((self.L * self.L), dtype=complex)
ylm_harmonic[ind] = 1
# convert Ylm from pixel to harmonic space
ylm_pixel = ssht.inverse(ylm_harmonic, self.L, Method=self.method)
# get value at pixel (i, j)
ylm_omega = np.conj(ylm_pixel[pix_i, pix_j])
return ylm_omega
Original
sys 0m1.5s
def translate_dirac_delta(self, flm, pix_i, pix_j):
flm_trans = self.complex_translation(flm)
return flm_trans
def complex_translation(self, flm):
for ell in range(self.L):
for m in range(-ell, ell + 1):
ind = ssht.elm2ind(ell, m)
conj_pixel_val = self.calc_pixel_value(ind)
flm[ind] = conj_pixel_val
return flm
Parallel
sys 0m1.5s
def translate_dirac_delta(self, flm, pix_i, pix_j):
# create arrays to store final and intermediate steps
result_r = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
result_i = np.ctypeslib.as_ctypes(np.zeros(flm.shape))
shared_array_r = multiprocessing.sharedctypes.RawArray(
result_r._type_, result_r)
shared_array_i = multiprocessing.sharedctypes.RawArray(
result_i._type_, result_i)
# ensure function declared before multiprocessing pool
global complex_func
def complex_func(ell):
# store real and imag parts separately
tmp_r = np.ctypeslib.as_array(shared_array_r)
tmp_i = np.ctypeslib.as_array(shared_array_i)
# perform translation
for m in range(-ell, ell + 1):
ind = ssht.elm2ind(ell, m)
conj_pixel_val = self.calc_pixel_value(
ind, pix_i, pix_j)
tmp_r[ind] = conj_pixel_val.real
tmp_i[ind] = conj_pixel_val.imag
# initialise pool and apply function
with multiprocessing.Pool() as p:
p.map(complex_func, range(self.L))
# retrieve real and imag components
result_r = np.ctypeslib.as_array(shared_array_r)
result_i = np.ctypeslib.as_array(shared_array_i)
# combine results
return result_r + 1j * result_i
Upvotes: 0
Views: 80
Reputation: 40753
For a given process, user and sys time time is the cumulative time spent by a process and its children executing program code and kernel calls respectively. The time function returns wall time (real time), which is more like a stop clock, enabling you to measure time elapsed between one moment and the next.
It is no surprise your multi processing solution takes up more user time than your original solution as more time is spent copying data between the parent and child processes. However, overall your job still completes in a smaller amount of real time.
https://en.wikipedia.org/wiki/Time_%28Unix%29
Upvotes: 2