NarenSuri
NarenSuri

Reputation: 51

how are the multiprocessing and threading and thread pooling working

https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612 From this link I have studied, I have few questions Q1 : How thread pool (Concurrent) and threading are different here? why do we see the performance improvement. Threading with Que is having 4 threads and each runs cooperatively during the idle time and picks the item from the Que once they get website response. As i see, the thread pool is also in a way doing the same. completing its work and waiting for the manager to assign a task; which is very similar to picking a new item from the Que. I'm not sure how this is different and why i see the perfroamcne improvment. Seems i'm wrong in interpreting the poling here. Could you expalin

Q2 : Question 2 : using multiprocessing the time taken is more. If I have multiprocessor which can handle multiple processes at a time, then all my 4 processes should be handled by it at a time. That is the real parallelization is happening. Also, I have a question here - in such case since 4 processes are running same function doesn't GIL try to stop them executing the same piece of code. Lets suppose all of them share a common variable that gets updated - like number of websites checked. So how does GIL work in these cases of multiprocessing? Also, here are the same processes used again and again or they get killed and created every time after their job - I think same processes are used. Also, I think that the performance problem is because of the process creation compared to light weight threads at the concurrent threading phase - which is costly. So could you explain more in detail how the GIL is working here and process are running, are they running cooperatively (like each process wait for its turn - like threads in a process do). Or are these processes using the multiprocessors to run really parallel. Also, my other question is If I have a 8 core machine, I think I can run 8 threads of a same process simultaneously or parallel. if I have the 8 core machine can I run 2 processes with 4 threads each? can I run 8 processes on 8 cores? I think cores are only for threads of a process, which means I cant run the 8 process on 8 cores but I can run as many number of processes as many CPU's or multiprocessor system is mine, am i right? So can I run 2 processes with 4 threads each? on my 8 core machine with 2 multiprocessors and each processor having 4 cores each?

Upvotes: 1

Views: 621

Answers (1)

varun
varun

Reputation: 2107

Python has a rich set of libraries for multitasking with Processes and Threads. However, there is overlap between the libraries, the choice depends on how abstractly you view the computational tasks. For example, the concurrent.futures library views threads as asynchronous tasks, while the Threading library deals with them as high-level threads. Further, the _thread implements a low-level interface for threading exposing all the synchronization mechanisms.

The GIL(Global Interpreter Lock) is just a synchronization primitive, specifically a mutex which prevents multiple threads of the same process from executing Python bytecode fragments(for certain objects which need to remain consistent with concurrent operations). This is exactly why Python threads excel with I/O operations in terms of speed when compared to compute intensive tasks.(owing to the fact that the GIL is released in case of certain blocking calls/computationally intensive libraries such as numpy). Note that only CPython and Pypy versions of Python are constrained by the GIL mechanism.

Now, let's see those questions...

  1. How thread pool (Concurrent) and threading are different here? Why do we see the performance improvement?

Coming to the comparison between Threading and concurrent.futures.ThreadPoolExecutor (aka threading_squirrel vs future_squirrel), I've executed both programs with the same test case. There are two factors that contribute to this "performance improvement":

  • Network HEAD requests: Remember that network operations need not complete in the same time period every time you execute them... due to the very nature of packet transfer delays...

enter image description here

  • Order of thread execution: In the website you've linked, the author creates all threads initially, sets up the queue full of website links and then starts all of them in a list comprehension loop. In ThreadPoolExecutor of concurrent.futures, each time a task is submitted, a thread is assigned to it if the predefined maximum number of threads/workers have not been reached. I've changed the code to mirror this technique. It seems to give a speedup as the first thread begins work early on and doesn't need to wait for the queue to be filled up...

enter image description here

  1. How does GIL work in these cases of multiprocessing?

Remember that the GIL comes into effect for threads of a process only, not among processes. GIL locks up the whole interpreter bytecode during a thread of execution, so the other threads have to wait for their turn. This is the reason multiprocessing used processes instead of threads, as each process has it's own interpreter and consequently, it's own GIL.

  1. Are the same processes used again and again or they get killed and created every time after their job?

The concept of pooling is to reduce the overhead of creating and destroying workers(be it threads or processes) during computation. However, the processes are kind of "brand new" in the sense that the library effectively asks the OS to perform a fork in an UNIX based OS or spawn in an NT based OS...

  1. Also, are the processes running co-operatively?

Maybe. They have to run in co-operation if they use shared memory...(need not be running together). There is definitely going to be a context switch if there are more processes than the OS can allocate to its processors' cores. They can run in parallel if there's no shared memory updates to make.

  1. If I have the 8 core machine can I run 2 processes with 4 threads each? Can I run 8 processes on 8 cores?

Sure (subject to the GIL, in Python). Each process can be allocated to each processing unit for execution. A processing unit can be a physical or a virtual core of a CPU. As long as the OS scheduler supports it, it's possible. Any reasonable split up of processes and threads are possible. If all are allocatable, that's the best situation, else you will encounter context switches...(which are more expensive when it comes to processes)

Hope I've answered all those questions!

Here are a few resources:

MultiCore CPUs, Multithreading and context switching?

Why does multiprocessing use only a single core after I import numpy?

Bonus celery-squirrel resource

Upvotes: 2

Related Questions