multithreaded code with released GIL and complex threads is slower in Python

Question

I am working on a 8 core processor machine with 8G RAM and Linux Redhat 7 and I am using Pycharm IDE.

I tried to use python threading module to take the advantage of multi-core processing but I ended up with a much slower code. I released the GIL through Numba and made sure my threads are doing sufficiently complex calculations so the problem is not what discussed in for example How to make numba @jit use all cpu cores (parallelize numba @jit)

Here is the multi threaded code:

l=200

@nb.jit('void(f8[:],f8,i4,f8[:])',nopython=True,nogil=True)
def force(r,ri,i,F):

   sum=0
   for j in range(12):
       if (j != i):
           fij=-4 * (12*1**12/(r[j]-ri)**13-6*1**6/(r[j]-ri)**7)
           sum=sum+fij

   F[i+12]=sum

def ODEfunction(r, t):

   f = np.zeros(2 * 12)

   lbound=-4* (12*1**12/(-0.5*l-r[0])**13-6*1**6/(-0.5*l-r[0])**7)
   rbound=-4* (12*1**12/(0.5*l-r[12-1])**13-6*1**6/(0.5*l-r[12-1])**7)

   f[0:12]=r[12:2*12]

   thlist=[threading.Thread(target=force, args=(r,r[i],i,f)) for i in range(12)]

   for thread in thlist:
       thread.start()
   for thread in thlist:
       thread.join()

   f[12]=f[12]+lbound
   f[2*12-1]=f[2*12-1]+rbound
   return f

And this is the sequential version:

l=200

@nb.autojit()
def ODEfunction(r, t):

   f = np.zeros(2 * 12)
   lbound=-4* (12*1**12/(-0.5*l-r[0])**13-6*1**6/(-0.5*l-r[0])**7)
   rbound=-4* (12*1**12/(0.5*l-r[12-1])**13-6*1**6/(0.5*l-r[12-1])**7)

   f[0:12]=r[12:2*12]

   for i in range(12):
       fi = 0.0
       for j in range(12):
           if (j!=i):
               fij = -4 * (12*1**12/(r[j]-r[i])**13-6*1**6/(r[j]-r[i])**7)
               fi = fi + fij
       f[i+12]=fi
   f[12]=f[12]+lbound
   f[2*12-1]=f[2*12-1]+rbound
   return f

I also thought to attach an image of the system monitor during the run of the multi-threaded and the sequential code:

System Motinor during the run of the multi threaded code

System Motinor during the run of the sequential code

Does anybody know what can be the reason for this inefficiency in threaded code?

multithreaded code with released GIL and complex threads is slower in Python

Answers (1)

Related Questions