omargamal8
omargamal8

Reputation: 601

Having a shared object in python's Dask.multiprocessing

I am using dask library trying to create multi threaded programs. But I am facing a problem here. Example:

from dask import compute, delayed
import dask.multiprocessing

arr = []

def add():
    arr.append("a")

tasks = [delayed(add)(),delayed(add)()]
compute(*tasks, get = dask.multiprocessing.get)
print(arr)

This code's output is simply [ ].. because I am using multiprocessing. If I am using get = dask.threaded.get The code's output will be = ['a', 'a']

I also need to use multiprocessing to achieve actual parallelism on multiple cores.

So my question is.. is there a way to use dask.multiprocessing and still have the ability to access a shared object?

Upvotes: 1

Views: 1280

Answers (1)

MRocklin
MRocklin

Reputation: 57271

Under normal operation Dask assumes that functions do not rely on global state. Your functions should consume inputs and return outputs and should not rely on any other information other than what they are given.

Even when using the threaded scheduler you might want to beware of affecting global state because that state might not be threadsafe.

Upvotes: 2

Related Questions