LKeene
LKeene

Reputation: 647

Confusion regarding multiprocessing.pool memory usage in Python

I've been reading up on Python's "multiprocessing", specifically the "Pool" stuff. I'm familiar with threading but not the approach used here. If I were to pass a very large collection (say a dictionary of some sort) to the process pool ("pool.map(myMethod, humungousDictionary)") are copies made of the dictionary in memory and than handed off to each process, or does there exist only the one dictionary? I'm concerned about memory usage. Thank you in advance.

Upvotes: 0

Views: 878

Answers (1)

cs95
cs95

Reputation: 402513

The short answer is: No. Processes work in their own independent memory space, effectively duplicating your data.

If your dictionary is read only, and modifications will not be made, here are some options you could consider:

  1. Save your data into a database. Each worker will read the data and work independently
  2. Have a single process with a parent that spawns multiple workers using os.fork. Thus, all threads share the same context.
  3. Use shared memory. Unix systems offer shared memory for interprocess communication. If there is a chance of racing, you will need semaphores as well.

You may also consider referring here for deeper insight on a possible solution.

Upvotes: 1

Related Questions