Drphoton
Drphoton

Reputation: 534

Difference between a "worker" and a "task" for concurrent.futures.ProcessPoolExecutor

I've got an "embarrassingly parallel" problem running on python, and I thought I could use the concurrent.futures module to parallelize this computation. I've done this before successfully, and this is the first time I'm trying to do this on a computer that's more powerful than my laptop. This new machine has 32 cores / 64 threads, compared to 2/4 on my laptop.

I'm using a ProcessPoolExecutor object from the concurrent.futures library. I set the max_workers argument to 10, and then submit all of my jobs (of which there are maybe 100s) one after the other in a loop. The simulation seems to work, but there is some behaviour I don't understand, even after some intense googling. I'm running this on Ubuntu, and so I use the htop command to monitor my processors. What I see is that:

  1. 10 processes are created.
  2. Each process requests > 100% CPU power (say, up to 600%)
  3. A whole bunch of processes are created as well. (I think these are "tasks", not processes. When I type SHIFT+H, they disappear.)
  4. Most alarmingly, it looks like ALL of processors spool up to 100%. (I'm talking about the "equalizer bars" at the top of the terminal:

Screenshot of htop

My question is — if I'm only spinning out 10 workers, why do ALL of my processors seem to be being used at maximum capacity? My working theory is that the 10 workers I call are "reserved," and the other processors just jump in to help out... if someone else were to run something else and ask for some processing power (but NOT including my 10 requested workers), my other tasks would back off and give them back. But... this isn't what "creating 10 processes" intuitively feels like to me.

If you want a MWE, this is roughly what my code looks like:

def expensive_function(arg):
    a = sum(list(range(10 ** arg)))
    print(a)
    return a


def main():
    import concurrent.futures
    from random import randrange

    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
        # Submit the tasks
        futures = []
        for i in range(100):
            random_argument = randrange(5, 7)
            futures.append(executor.submit(expensive_function, random_argument))

        # Monitor your progress:
        num_results = len(futures)
        for k, _ in enumerate(concurrent.futures.as_completed(futures)):
            print(f'********** Completed {k + 1} of {num_results} simulations **********')


if __name__ == '__main__':
    main()

Upvotes: 1

Views: 1182

Answers (1)

Ahmed AEK
Ahmed AEK

Reputation: 17591

due to the GIL a single proccess can have only 1 thread executing python bytecode at a given time, so if you have 10 processes you should have 10 threads (and therefore cores) executing python bytecode at a given time, however this is not the full story.

the expensive_function is ambiguous, python can create 10 worker processes, and therefore you can only have 10 cores executing python code at a given time (+ main process) (due to GIL), however, if expensive_function is doing some sort of multithreading using an external C module (which doesn't have to abide to the GIL), then each of the 10 processes can have Y threads working in parallel and therefore you'll have a total of 10*Y cores being utilized at a given time, for example your code might be running 6 threads externally on each of the 10 processes for a total of 60 threads running concurrently on 60 cores.

however this doesn't really answer your question, so the main answer is, workers is the number of processes (cores) that can execute python bytecode at a given time (with a strong emphasis on "python bytecode"), wheres tasks is the total number of tasks that will be executed by your workers, and when any worker finishes the task at hand, it will start another task.

Upvotes: 1

Related Questions