Reputation: 1143
I have a numpy array that I would like to share between a bunch of python processes in a way that doesn't involve copies. I create a shared numpy array from an existing numpy array using the sharedmem package.
import sharedmem as shm
def convert_to_shared_array(A):
shared_array = shm.shared_empty(A.shape, A.dtype, order="C")
shared_array[...] = A
return shared_array
My problem is that each subprocess needs to access rows that are randomly distributed in the array. Currently I create a shared numpy array using the sharedmem package and pass it to each subprocess. Each process also has a list, idx, of rows that it needs to access. The problem is in the subprocess when I do:
#idx = list of randomly distributed integers
local_array = shared_array[idx,:]
# Do stuff with local array
It creates a copy of the array instead of just another view. The array is quite large and manipulating it first before shareing it so that each process accesses a contiguous range of rows like
local_array = shared_array[start:stop,:]
takes too long.
Question: What are good solutions for sharing random access to a numpy array between python processes that don't involve copying the array?
The subprocesses need readonly access (so no need for locking on access).
Upvotes: 5
Views: 620
Reputation: 80770
Fancy indexing induces a copy, so you need to avoid fancy indexing if you want to avoid copies there is no way around it.
Upvotes: 1