Vinith Vemana
Vinith Vemana

Reputation: 51

How to share large read only dictionary/list across processes in multiprocessing in python?

I have a 18Gb pickle file which i need to be accessing across processes. I tried using

from multiprocessing import Manager
import cPickle as pkl
manager = Manager()
data = manager.dict(pkl.load(open("xyz.pkl","rb")))

However, I am getting the following issue:

IOError: [Errno 11] Resource temporarily unavailable 

Someone suggested it might be because of socket timeout but it doesn't seem like it as increasing the timeout did not help. How do I go about this. Is there any other efficient way of sharing data across processes?

Upvotes: 2

Views: 2708

Answers (1)

torek
torek

Reputation: 487755

This is mostly a duplicate of Shared memory in multiprocessing, but you're specifically looking at a dict or list object, rather than a simple array or value.

Unfortunately, there is no simple way to share a dictionary or list like this, as the internals of a dictionary are complicated (and differ across different Python versions). If you can restructure your problem so that you can use an Array object, you can make a shared Array, fill it in once, and use it with no lock. This will be much more efficient in general.

It's also possible, depending on access patterns, that simply loading the object first, then creating your process pools, will work well on Unix-like systems with copy-on-write fork. But there are no guarantees here.

Note that the error you are getting, error number 11 = EAGAIN = "Resource temporarily unavailable", happens when trying to send an enormous set of values through a Manager instance. Managers don't share the underlying data at all: instead, they proxy accesses so that each independent process has its own copy, and when one process update a value, it sends the update to everyone participating in the illusion-of-sharing. In this way, everyone can (eventually, depending on access timing) agree as to what the values are, so that they all seem to be using the same values. In reality, though, each one has a private copy.

Upvotes: 9

Related Questions