Reputation: 31
I have a function in which I create a pool
of processes. More over I use multiprocessing.Value()
and multiprocessing.Lock()
in order to manage some shared values between processes.
I want to do the same thing with an array of objects in order to share it between processes but I don't know how to do it. I will only read from that array.
This is the function:
from multiprocessing import Value,Pool,Lock,cpu_count
def predict(matches_path, unknown_path, files_path, imtodetect_path, num_query_photos, use_top3, uid, workbook, excel_file_path,modelspath,email_address):
shared_correct_matched_imgs = Value('i', 0)
shared_unknown_matched_imgs = Value('i', 0)
shared_tot_imgs = Value('i', 0)
counter = Value('i', 0)
shared_lock = Lock()
num_workers = cpu_count()
feature = load_feature(modelspath)
pool = Pool(initializer=init_globals,
initargs=[counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs,
shared_lock], processes=num_workers)
for img in glob.glob(os.path.join(imtodetect_path, '*g')):
pool.apply_async(predict_single_img, (img,imtodetect_path,excel_file_path,files_path,use_top3,uid,matches_path,unknown_path,num_query_photos,index,modelspath))
index+=increment
pool.close()
pool.join()
The array is created with the instruction feature = load_feature(modelspath)
. This is the array that I want to share.
In init_globals
I inizialize the shared value:
def init_globals(counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs, shared_lock):
global cnt, tot_imgs, correct_matched_imgs, unknown_matched_imgs, lock
cnt = counter
tot_imgs = shared_tot_imgs
correct_matched_imgs = shared_correct_matched_imgs
unknown_matched_imgs = shared_unknown_matched_imgs
lock = shared_lock
Upvotes: 1
Views: 237
Reputation: 11075
The easy way of providing shared static data is simply to make it a global variable accessible to the function you want to call. If you're using an operating system which supports "fork", it is very straightforward to use global variables in child processes as long as they're constant (if you modify them, changes won't be reflected in the other processes)
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("fork")
#defining "shared" here would be valid also
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #same random number means "shared" is same object
This won't work when using "spawn" as the start method (the only one available on windows), because the memory of the parent is not shared in any way with the child, so the child must "import" the main file to gain access to whatever the target function is (this is also why you can run into problems with decorators.) If you define your data outside the if __name__ == "__main__":
block, it will kinda work, but you will have made separate copies of the data, which can be undesirable if it's big, slow to create, or can change each time it's created.
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("spawn")
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #different number means different copy of "shared" (1 a million chance of being same i guess...)
Upvotes: 1