Outcast
Outcast

Reputation: 5117

Multiprocessing.pool - Pass another variable in the parallelisable function

Let's say that I have the following code:

path = "/my_path/"
filename_ending = '.json'


json_files = [file for file in os.listdir(f"{path}") if file.endswith(filename_ending)]


def read_extracted(name):
    with open(f"/my_path/{name}", 'r') as f:
        return json.load(f)


with mp.Pool(processes=os.cpu_count()-1) as pool:       
    json_list = pool.map(read_extracted, json_files) 

but I want to pass another variable in the read_extracted function which will determine the path.

So I want to function to be like that (so that it can be used for other paths too):

def read_extracted(name, path):
    with open(f"{path}{name}", 'r') as f:
        return json.load(f)

However how then this line:

json_list = pool.map(read_extracted, json_files) 

should be written to work properly?

Is there any better option?

Upvotes: 0

Views: 21

Answers (1)

Lior Cohen
Lior Cohen

Reputation: 5745

You have two options:

General option is to pass an iterable of sequence(for example, a tuple)

json_files_and_path = [(f1, path), (f2, path)]
json_list = pool.map(read_extracted, json_files_and_path)

and change the function signature to

def read_extracted(*args):
  name, path = args

Second option specific to your case is just to pass a list of full path.

json_files = ['path/to/f1', 'path/to/f2']

Upvotes: 1

Related Questions