Reputation: 53
I may be in a little over my head here, but I am working on a little bioinformatics project in python. I am trying to parallelism a program that analyzes a large dictionary of sets of strings (~2-3GB in RAM). I find that the multiprocessing version is faster when I have smaller dictionaries but is of little benefit and mostly slower with the large ones. My first theory was that running out of memory just slowed everything and the bottleneck was from swapping into virtual memory. However, I ran the program on a cluster with 4*48GB of RAM and the same slowdown occurred. My second theory is that access to certain data was being locked. If one thread is trying to access a reference currently being accessed in another thread, will that thread have to wait? I have tried creating copies of the dictionaries I want to manipulate, but that seems terribly inefficient. What else could be causing my problems?
My multiprocessing method is below:
def worker(seqDict, oQueue):
#do stuff with the given partial dictionary
oQueue.put(seqDict)
oQueue = multiprocessing.Queue()
chunksize = int(math.ceil(len(sdict)/4)) # 4 cores
inDict = {}
i=0
dicts = list()
for key in sdict.keys():
i+=1
if len(sdict[key]) > 0:
inDict[key] = sdict[key]
if i%chunksize==0 or i==len(sdict.keys()):
print(str(len(inDict.keys())) + ", size")
dicts.append(copy(inDict))
inDict.clear()
for pdict in dicts:
p =multiprocessing.Process(target = worker,args = (pdict, oQueue))
p.start()
finalDict = {}
for i in range(4):
finalDict.update(oQueue.get())
return finalDict
Upvotes: 3
Views: 2842
Reputation: 123531
Seems like the data from the "large dictionary of sets of strings" could be reformatted into a something that could be stored in a file or string, allowing you to use the mmap
module to share it among all the processes. Each process might incur some startup overhead if it needs to convert the data back into some other more preferable form, but that could be minimized by passing each process something indicating what subset of the whole dataset in shared memory they should do their work on and only reconstitute the part required by that process.
Upvotes: 1
Reputation: 26160
As I said in the comments, and as Kinch said in his answer, everything passed through to a subprocess has to be pickled and unpickled to duplicate it in the local context of the spawned process. If you use multiprocess.Manager.dict
for sDict
(thereby allowing processes to share the same data through a server that proxies the objects created on it) and spawning the processes with slice indices in to that shared sDict
, that should cut down on the serialize/deserialize sequence involved in spawning the child processes. You still might hit bottlenecks with that though in the server communication step of working with the shared objects. If so, you'll have to look at simplifying your data so you can use true shared memory with multiprocess.Array
or multiprocess.Value
or look at multiprocess.sharedctypes
to create custom datastructures to share between your processes.
Upvotes: 2
Reputation: 1
Every data which is passed through the queue will be serialized and deserialized using pickle. I would guess this could be a bottleneck if you pass a lot of data round.
You could reduce the amount of data, make use of shared memory, write a multi-threading version in a c extension or try a multithreading version of this with a multithreading safe implemention of python (maybe jython or pypy; I don't know).
Oh and by the way: You are using multiprocessing and not multithreading.
Upvotes: 0