Dataframe not updating through multiprocess Python keeps running even if finished

Question

I am a newbie of multiprocessing and i am using the said library in python to parallelize the computation of a parameter for the rows of a dataframe. The idea is the following: I have two functions, g for the actual computation and f for filling the dataframe with the computed values. I call the function f with pool.apply_async. The problem is that at the end of the poo.async the dataframe has not been updated even though a print inside f easily shows that it is saving correctly the values. So I thought to save the results in a file excel inside the f function as showed in my pseudo code below. However, what I obtain is that the file excel where i save the results stops to be updated after 2 values and the kernel keeps running even though the terminal shows that the script has computed all the values.

This is my pseudo code:

def g(path to image1, path to image 2):
    #vectorize images 
    #does computation
    return value #value is a float



def f(row, index):
    
    value= g(row.image1, row.image2)
    df.at[index, 'value'] = value
    df.to_csv('dftest.csv')

    return df

def callbackf(result):
    global results
    results.append(result)

inside the main:

results=[]
pool = mp.Pool(N_CORES)


for index, row in df.iterrows():

    pool.apply_async(f,
                     args=(row, index),
                     callback=callbackf)

I tried to use with get_context("spawn").Pool() as pool inside the main as suggested by https://pythonspeed.com/articles/python-multiprocessing/ but it didn't solve my problem. What am I doing wrong? Is it possible that the vectorizing the images at each row causes problem to the multiprocessing?

Dataframe not updating through multiprocess Python keeps running even if finished

Answers (1)

Related Questions