Lei Hao
Lei Hao

Reputation: 799

How to avoid data copying in joblib parallel?

I have a function f(df, x) where df is a large dataframe and x is a simple variable. The function f only read from df and doesn't modify it. Is it possible to share the memory of df and not copying it to sub-processes when using joblib.Parallel or other multiprocessing module?

  1. I'd like to avoid turning df into a global variable, as I'd like to reuse the code to process other data.
  2. It's not possible to turn df into numpy array, as f needs to locate data using index of df.

Edit:

Will df be copied to sub-process while executing Parallel in the following code?

def g(df):

    def f(x):
        nonlocal df
        ...
        return z

    list_res = Parallel(10)(delayed(f)(x) for x in iterables)
    return list_res

Upvotes: 0

Views: 24

Answers (0)

Related Questions