How to start a new thread when old one finishes?

Question

I have a large dataset in a list that I need to do some work on.

I want to start x amounts of threads to work on the list at any given time, until everything in that list has been popped.

I know how to start x amounts of threads (lets say 20) at a given time (by using thread1....thread20.start())

but how do I make it start a new thread when one of the first 20 threads finish? so at any given time there are 20 threads running, until the list is empty.

what I have so far:

class queryData(threading.Thread):
    def __init__(self,threadID):
        threading.Thread.__init__(self)
        self.threadID = threadID
    def run(self):
        global lst
        #Get trade from list
        trade = lst.pop()
        tradeId=trade[0][1][:6]
        print tradeId


thread1 = queryData(1)
thread1.start()

Update

I have something going with the following code:

for i in range(20):
    threads.append(queryData(i))
for thread in threads:
    thread.start()

while len(lst)>0:
    for iter,thread in enumerate(threads):
        thread.join()
        lock.acquire()
        threads[iter] = queryData(i)
        threads[iter].start()
        lock.release()

Now it starts 20 threads in the beginning...and then keeps starting a new thread when one finishes.

However, it is not efficient, as it waits for the first one in the list to finish, and then the second..and so on.

Is there a better way of doing this?

Basically I need:

-Start 20 threads:
-While list is not empty:
   -wait for 1 of the 20 threads to finish
   -reuse or start a new thread

martineau · Accepted Answer

As I suggested in a comment, I think using a multiprocessing.pool.ThreadPool would be appropriate — because it would handle much of the thread management you're manually doing in your code automatically. Once all the threads are queued-up for processing via ThreadPool's apply_async() method calls, the only thing that needs to be done is wait until they've all finished execution (unless there's something else your code could be doing, of course).

I've translated the code in my linked answer to another related question so it's more similar to what you appear to be doing to make it easier to understand in the current context.

from multiprocessing.pool import ThreadPool
from random import randint
import threading
import time

MAX_THREADS = 5
print_lock = threading.Lock()  # Prevent overlapped printing from threads.

def query_data(trade):
    trade_id = trade[0][1][:6]
    time.sleep(randint(1, 3))  # Simulate variable working time for testing.
    with print_lock:
        print(trade_id)

def process_trades(trade_list):
    pool = ThreadPool(processes=MAX_THREADS)
    results = []
    while(trade_list):
        trade = trade_list.pop()
        results.append(pool.apply_async(query_data, (trade,)))

    pool.close()  # Done adding tasks.
    pool.join()  # Wait for all tasks to complete.

def test():
    trade_list = [[['abc', ('%06d' % id) + 'defghi']] for id in range(1, 101)]
    process_trades(trade_list)

if __name__ == "__main__":
    test()

How to start a new thread when old one finishes?

Answers (2)

Related Questions