Ashwin Baldawa
Ashwin Baldawa

Reputation: 51

Python Thread pool hanging the application

i am currently running the below python script:

# Function to process MMR search
def process_mmr_search(row, itemdesc):
    try:
        formatted_itemdesc = str(row[itemdesc])
        print('formatted_itemdesc mmr', formatted_itemdesc)
        docs = indexed_taxonomy_described_cleaned.max_marginal_relevance_search(formatted_itemdesc, 20)
        print('docs mmr',docs)
        return [doc.page_content for doc in docs]
    except Exception as e:
        print(f"Error in MMR search: {e}")
        return []

# Function to handle threading for MMR search
def threaded_mmr_search(index, row, itemdesc):
    mmr_matches = process_mmr_search(row, itemdesc)
    return index, mmr_matches


# Run the MMR search with threading
with ThreadPoolExecutor(max_workers=4) as executor:  # Adjust max_workers based on available resources
    future_mmr = {executor.submit(threaded_mmr_search, index, row, 'Material Description'): index for index, row in spend_sheet_uniques.iterrows()}
    
    for future in as_completed(future_mmr):
        index, mmr_matches = future.result()
        spend_sheet_uniques.at[index, 'Best_Matches_MMR'] = str(mmr_matches)

Objective: spend_sheet_uniques is a dataframe, the whole logic is to just perform similarity search for each row in that dataframe , the embedding is FAISS.

Issue: After executing some rows the application just hangs and doesnt move forward, there is no specific row it stops at, it is different in different times, rarely it processes all the rows.

Upvotes: 1

Views: 58

Answers (1)

ebonnal
ebonnal

Reputation: 1167

There is nothing wrong with how you use your thread pool, this looks like a deadlock issue. To force tasks isolation, you can try to use a ProcessPoolExecutor instead :

if __name__ == "__main__":
    with ProcessPoolExecutor(...

Note: as your tasks look cpu-bound, parallelism ("true concurrency") via processes is probably what you need anyway, right? (or disabling the GIL).

Upvotes: 0

Related Questions