Reputation: 2560
I'm trying to multiprocess some existing code and I'm finding that the pickling/unpickling of data to the processes is too slow with a Pool
. I think for my situation a Manager
will suffer the same issues since it does the same pickling behind the scenes.
To solve the issue I'm trying to move to a shared memory array. For this to work, I need an array of strings. It seems that multiprocessing.Array
supports a ctypes.c_char_p
but I'm having difficulty extending this into an array of strings. Below is a few of the many things I've tried.
#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
# Tested possible solutions
ver = 1
if 1==ver:
strings = mpsc.RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
tmp_strings = [mpsc.RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
strings = mpsc.RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
strings = []
for i in xrange(4):
strings.append( mpsc.RawValue(ctypes.c_char_p, 10) )
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum] = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
print '%10s : %s' % (x, list(strings))
print
print 'Multi-process'
pool = mp.Pool(3)
for x in pool.map(worker, data):
print '%10s : %s' % (x, list(strings))
print ' ', [isinstance(s, str) for s in strings]
Note that I'm using the multiprocessing.sharedctypes
because I don't need locking and it should be fairly interchangeable with multiprocessing.Array
The issue with the above code is that the resultant strings
object contains regular strings, not shared memory strings coming out of the mpsc.RawArray
constructor. With version 1 and 2 you can see how the data gets scrambled when working out of process (as expected). For me, version 3 looked like it worked initially but you can see the =
is just setting the object to a regular string and while this works for the short test, in the larger program it creates issues.
It seems like there should be a way to create a shared array of pointers where the pointers point strings in shared memory space. The c_void_p
type complains if you try to initialize it with a c_str_p
type and I haven't had any luck manipulating the underlying address pointers directly yet.
Any help would be appreciated.
Upvotes: 3
Views: 3561
Reputation: 20214
First, your third solution doesn't work as strings
isn't changed by multiprocessing part but has been modified by single process part. You can have a check by commenting your single process part.
Second, This one will work:
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
strings = [mpsc.RawArray(ctypes.c_char, 10) for _ in xrange(4)]
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum].value = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]
print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp.Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]
output:
Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']
Upvotes: 3