Reputation: 601
I am using dask
library trying to create multi threaded programs. But I am facing a problem here.
Example:
from dask import compute, delayed
import dask.multiprocessing
arr = []
def add():
arr.append("a")
tasks = [delayed(add)(),delayed(add)()]
compute(*tasks, get = dask.multiprocessing.get)
print(arr)
This code's output is simply [ ]
.. because I am using multiprocessing. If I am using get = dask.threaded.get
The code's output will be = ['a', 'a']
I also need to use multiprocessing to achieve actual parallelism on multiple cores.
So my question is.. is there a way to use dask.multiprocessing and still have the ability to access a shared object?
Upvotes: 1
Views: 1280
Reputation: 57271
Under normal operation Dask assumes that functions do not rely on global state. Your functions should consume inputs and return outputs and should not rely on any other information other than what they are given.
Even when using the threaded scheduler you might want to beware of affecting global state because that state might not be threadsafe.
Upvotes: 2