Reputation: 31
I am a newbie of multiprocessing and i am using the said library in python to parallelize the computation of a parameter for the rows of a dataframe. The idea is the following: I have two functions, g for the actual computation and f for filling the dataframe with the computed values. I call the function f with pool.apply_async. The problem is that at the end of the poo.async the dataframe has not been updated even though a print inside f easily shows that it is saving correctly the values. So I thought to save the results in a file excel inside the f function as showed in my pseudo code below. However, what I obtain is that the file excel where i save the results stops to be updated after 2 values and the kernel keeps running even though the terminal shows that the script has computed all the values.
This is my pseudo code:
def g(path to image1, path to image 2):
#vectorize images
#does computation
return value #value is a float
def f(row, index):
value= g(row.image1, row.image2)
df.at[index, 'value'] = value
df.to_csv('dftest.csv')
return df
def callbackf(result):
global results
results.append(result)
inside the main:
results=[]
pool = mp.Pool(N_CORES)
for index, row in df.iterrows():
pool.apply_async(f,
args=(row, index),
callback=callbackf)
I tried to use with get_context("spawn").Pool() as pool inside the main as suggested by https://pythonspeed.com/articles/python-multiprocessing/ but it didn't solve my problem. What am I doing wrong? Is it possible that the vectorizing the images at each row causes problem to the multiprocessing?
Upvotes: 1
Views: 117
Reputation: 31
At the end I saved the results in a txt instead of a csv and it worked. I don't know why it didn't work with csv though.
Here's the code I put instead of the csv and pickle lines:
with open('results.txt', 'a') as f:
f.write(image1 +
'\t' + image2 +
'\t' + str(value) +
'\n')
Upvotes: 1