bivouac0
bivouac0

Reputation: 2560

Shared Memory Array of Strings with Multiprocessing

I'm trying to multiprocess some existing code and I'm finding that the pickling/unpickling of data to the processes is too slow with a Pool. I think for my situation a Manager will suffer the same issues since it does the same pickling behind the scenes.

To solve the issue I'm trying to move to a shared memory array. For this to work, I need an array of strings. It seems that multiprocessing.Array supports a ctypes.c_char_p but I'm having difficulty extending this into an array of strings. Below is a few of the many things I've tried.

#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

# Tested possible solutions
ver = 1
if 1==ver:
    strings = mpsc.RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
    tmp_strings = [mpsc.RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
    strings = mpsc.RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
    strings = []
    for i in xrange(4):
        strings.append( mpsc.RawValue(ctypes.c_char_p, 10) )

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum] = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
    print '%10s : %s' % (x, list(strings))
print

print 'Multi-process'
pool = mp.Pool(3)
for x in pool.map(worker, data):
    print '%10s : %s' % (x, list(strings))
    print '            ', [isinstance(s, str) for s in strings]

Note that I'm using the multiprocessing.sharedctypes because I don't need locking and it should be fairly interchangeable with multiprocessing.Array

The issue with the above code is that the resultant strings object contains regular strings, not shared memory strings coming out of the mpsc.RawArray constructor. With version 1 and 2 you can see how the data gets scrambled when working out of process (as expected). For me, version 3 looked like it worked initially but you can see the = is just setting the object to a regular string and while this works for the short test, in the larger program it creates issues.

It seems like there should be a way to create a shared array of pointers where the pointers point strings in shared memory space. The c_void_p type complains if you try to initialize it with a c_str_p type and I haven't had any luck manipulating the underlying address pointers directly yet.

Any help would be appreciated.

Upvotes: 3

Views: 3561

Answers (1)

Sraw
Sraw

Reputation: 20214

First, your third solution doesn't work as strings isn't changed by multiprocessing part but has been modified by single process part. You can have a check by commenting your single process part.

Second, This one will work:

import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

strings = [mpsc.RawArray(ctypes.c_char, 10) for _ in xrange(4)]

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum].value = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]

print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp.Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]

output:

Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']

Upvotes: 3

Related Questions