Reputation: 6841
The function store_in_shm
writes a numpy array to the shared memory while the second function read_from_shm
creates a numpy array using data in the same shared memory space and returns the numpy array.
However, running the code in Python 3.8 gives the following segmentation error:
zsh: segmentation fault python foo.py
Why is there no problem accessing the numpy array from inside the function read_from_shm
, but a segmentation error appears when accessing the numpy array again outside of the function?
Output:
From read_from_shm(): [0 1 2 3 4 5 6 7 8 9]
zsh: segmentation fault python foo.py
% /Users/athena/opt/anaconda3/envs/test/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
foo.py
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
shm.close()
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shmData
if __name__ == '__main__':
data = np.arange(10)
shm = store_in_shm(data)
shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData) # no seg fault if we comment this line
shm.unlink()
Upvotes: 6
Views: 3878
Reputation: 11075
Basically the problem seems to be that the underlying mmap'ed file (owned by shm
within read_from_shm
) is being closed when shm
is garbage collected when the function returns. Then shmData
refers back to it, which is where you get the segfault (for referring to a closed mmap) This seems to be a known bug, but it can be solved by keeping a reference to shm
.
Additionally all SharedMemory
instances want to be close()
'd with exactly one of them being unlink()
'ed when it is no longer necessary. If you don't call shm.close()
yourself, it will be called at GC as mentioned, and on Windows if it is the only one currently "open" the shared memory file will be deleted. When you call shm.close()
inside store_in_shm
, you introduce an OS dependency as on windows the data will be deleted, and MacOS and Linux, it will retain until unlink
is called.
Finally though this doesn't appear in your code, another problem currently exists where accessing data from independent processes (rather than child processes) can similarly delete the underlying mmap too soon. SharedMemory
is a very new library, and hopefully all the kinks will work out soon.
You can re-write the given example to retain a reference to the "second" shm
and just use either one to unlink
:
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
#there must always be at least one `SharedMemory` object open for it to not
# be destroyed on Windows, so we won't `shm.close()` inside the function,
# but rather after we're done with everything.
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shm, shmData #we need to keep a reference of shm both so we don't
# segfault on shmData and so we can `close()` it.
if __name__ == '__main__':
data = np.arange(10)
shm1 = store_in_shm(data)
#This is where the *Windows* previously reclaimed the memory resulting in a
# FileNotFoundError because the tempory mmap'ed file had been released.
shm2, shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData)
shm1.close()
shm2.close()
#on windows "unlink" happens automatically here whether you call `unlink()` or not.
shm2.unlink() #either shm1 or shm2
Upvotes: 7