master_coder__
master_coder__

Reputation: 23

with concurrent.futures.ThreadPoolExecutor() as executor: ... does not wait

I am trying to use the ThreadPoolExecutor() in a method of a class to create a pool of threads that will execute another method within the same class. I have the with concurrent.futures.ThreadPoolExecutor()... however it does not wait, and an error is thrown saying there was no key in the dictionary I query after the "with..." statement. I understand why the error is thrown because the dictionary has not been updated yet because the threads in the pool did not finish executing. I know the threads did not finish executing because I have a print("done") in the method that is called within the ThreadPoolExecutor, and "done" is not printed to the console.

I am new to threads, so if any suggestions on how to do this better are appreciated!

    def tokenizer(self):
        all_tokens = []
        self.token_q = Queue()
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            for num in range(5):
                executor.submit(self.get_tokens, num)
            executor.shutdown(wait=True)

        print("Hi")
        results = {}
        while not self.token_q.empty():
            temp_result = self.token_q.get()
            results[temp_result[1]] = temp_result[0]
            print(temp_result[1])
        for index in range(len(self.zettels)):
            for zettel in results[index]:
                all_tokens.append(zettel)
        return all_tokens

    def get_tokens(self, thread_index):
        print("!!!!!!!")
        switch = {
            0: self.zettels[:(len(self.zettels)/5)],
            1: self.zettels[(len(self.zettels)/5): (len(self.zettels)/5)*2],
            2: self.zettels[(len(self.zettels)/5)*2: (len(self.zettels)/5)*3],
            3: self.zettels[(len(self.zettels)/5)*3: (len(self.zettels)/5)*4],
            4: self.zettels[(len(self.zettels)/5)*4: (len(self.zettels)/5)*5],
        }
        new_tokens = []
        for zettel in switch.get(thread_index):
            tokens = re.split('\W+', str(zettel))
            tokens = list(filter(None, tokens))
            new_tokens.append(tokens)
        print("done")
        self.token_q.put([new_tokens, thread_index])

'''

Expected to see all print("!!!!!!") and print("done") statements before the print ("Hi") statement. Actually shows the !!!!!!! then the Hi, then the KeyError for the results dictionary.

Upvotes: 1

Views: 18433

Answers (2)

shmee
shmee

Reputation: 5101

As you have already found out, the pool is waiting; print('done') is never executed because presumably a TypeError raises earlier.
The pool does not directly wait for the tasks to finish, it waits for its worker threads to join, which implicitly requires the execution of the tasks to complete, one way (success) or the other (exception).

The reason you do not see that exception raising is because the task is wrapped in a Future. A Future

[...] encapsulates the asynchronous execution of a callable.

Future instances are returned by the executor's submit method and they allow to query the state of the execution and access whatever its outcome is.

That brings me to some remarks I wanted to make.

The Queue in self.token_q seems unnecessary
Judging by the code you shared, you only use this queue to pass the results of your tasks back to the tokenizer function. That's not needed, you can access that from the Future that the call to submit returns:

def tokenizer(self):
    all_tokens = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(get_tokens, num) for num in range(5)]
        # executor.shutdown(wait=True) here is redundant, it is called when exiting the context:
        # https://github.com/python/cpython/blob/3.7/Lib/concurrent/futures/_base.py#L623

    print("Hi")
    results = {}
    for fut in futures:
        try:
            res = fut.result()
            results[res[1]] = res[0]
        except Exception:
            continue
    [...] 

def get_tokens(self, thread_index):
    [...]
    # instead of self.token_q.put([new_tokens, thread_index])
    return new_tokens, thread_index

It is likely that your program does not benefit from using threads
From the code you shared, it seems like the operations in get_tokens are CPU bound, rather than I/O bound. If you are running your program in CPython (or any other interpreter using a Global Interpreter Lock), there will be no benefit from using threads in that case.

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

That means for any Python process, only one thread can execute at any given time. This is not so much of an issue if your task at hand is I/O bound, i.e. frequently pauses to wait for I/O (e.g. for data on a socket). If your tasks need to constantly execute bytecode in a processor, there's no benefit for pausing one thread to let another execute some instructions. In fact, the resulting context switches might even prove detrimental.
You might want to go for parallelism instead of concurrency. Take a look at ProcessPoolExecutor for this.
However, I recommend to benchmark your code running sequentially, concurrently and in parallel. Creating processes or threads comes at a cost and, depending on the task to complete, doing so might take longer than just executing one task after the other in a sequential manner.


As an aside, this looks a bit suspicious:

for index in range(len(self.zettels)):
    for zettel in results[index]:
        all_tokens.append(zettel)

results seems to always have five items, because for num in range(5). If the length of self.zettels is greater than five, I'd expect a KeyError to raise here.
If self.zettels is guaranteed to have a length of five, then I'd see potential for some code optimization here.

Upvotes: 2

new-dev-123
new-dev-123

Reputation: 411

You need to loop over concurrent.futures.as_completed() as shown here. It will yield values as each thread completes.

Upvotes: 0

Related Questions