automobile2combat
automobile2combat

Reputation: 25

How to create a multiprocessing array of strings in Python?

I'm trying to create a multiprocessing array of strings with python 3 on ubuntu.

How do I declare the array? I've found out that I apparently can use ctypes (specifically c_wchar_p for a string) So I tried the following:

from ctypes import c_wchar_p
from multiprocessing import Array

string_array = Array(c_wchar_p,range(10))

Upvotes: 1

Views: 978

Answers (1)

Booboo
Booboo

Reputation: 44108

Actually, you cannot easily use c_wchar_p. Sure, the 3 statements you have will execute, but the purpose of creating data in shared memory is so that multiple processes can access and update that data and the problem is that if you do the following ...

string_array[0] = 'abc'

... you will be storing into shared memory the address of a string that is specific to one particular address space and it will be an invalid address when that string is referenced by a process in a different address space. The documentation for multiprocessing.sharedctypes addresses this with the following note:

Note: Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash.

You could try creating an array of characters instead whose size is the maximum size string you expect to store. The following code demonstrates this:

from ctypes import c_wchar
from multiprocessing.sharedctypes import RawArray
from multiprocessing import Pool

def init_pool(the_arr):
    global arr
    arr = the_arr

def worker():
    print(arr[0].value)
    print(arr[1].value)
    arr[2].value = 'It works!'

def main():
    # create a list of 10 RawArrays, each one capable of holding 20-character strings
    # The list itself is not meant to be modifiable, only the contained "strings"
    arr = [RawArray(c_wchar, 20) for _ in range(10)]
    arr[0].value = 'abc'
    arr[1].value = 'defghijklmn'
    # initialize process pool's processes' global variable arr
    pool = Pool(2, initializer=init_pool, initargs=(arr,))
    # worker function will execute in a different address space:
    pool.apply(worker)
    print(arr[2].value)

# Required for Windows:
if __name__ == '__main__':
    main()

Prints:

abc
defghijklmn
It works!

If you need a list that is modifiable (capable of growing and shrinking), then you should use a managed list and forget about shared memory (this will run quite a bit more slowly if you have a lot of accesses, but is more "natural"):

from multiprocessing import Pool, Manager

def init_pool(the_arr):
    global arr
    arr = the_arr

def worker():
    print(arr[0])
    print(arr[1])
    arr.append('It works!')

def main():
    arr = Manager().list(['abc', 'defghijklmn'])
    # initialize process pool's processes' global variable arr
    pool = Pool(2, initializer=init_pool, initargs=(arr,))
    pool.apply(worker)
    print(arr[2])

# Required for Windows:
if __name__ == '__main__':
    main()

Upvotes: 2

Related Questions