usual me
usual me

Reputation: 8778

Python multiprocessing - When is a referenced object shared? When is it copied?

I have a data structure L (it could be a list, a dict, ...) and I need multiple processes to read from it. I don't want to use a multiprocessing.Manager because it's slow.

Now if L is never modified, the internet told me it won't be fully copied by the child processes thanks to copy-on-write. But what if L is referenced by object a, which itself is modified? Does copy-on-write still apply? Example:

from multiprocessing import Pool
from a import A

READONLYLIST = list(range(pow(10, 6)))  # list will never be modified
a = A(READONLYLIST)  # object a will be modified

def worker(x):
    return a.worker(x)

print(Pool(2).map(worker, range(10)))

With module a as:

import random

class A(object):
    def __init__(self, readonlylist):
        self.readonlylist = readonlylist
        self.v = 0

    def worker(self, x):
        self.v = random.random()  # modify the object
        return x + self.readonlylist[-1]

Will READONLYLIST be fully copied by the child processes in this case?

Upvotes: 2

Views: 1178

Answers (1)

Michael
Michael

Reputation: 13914

Python multiprocessing does not share memory between processes and passes objects (including the called function) between processes by pickling them (representing the object as a string). So when you call a function within a pool, the main process must pickle the function, pass the pickled representation of the function to each subprocess, and then each subprocess must depickle the function to put the function into its own separate memory.

Upvotes: 3

Related Questions