rxu
rxu

Reputation: 1407

Define Ctypes array that overlaps in memory for numpy array and multiprocessing

How to define a ctype array buffer that can hold several numpy array of floats (say A, B, C) at one time point and then hold several numpy arrays of integers (say D, E) at another time point? Can this be done with some combination of ctypes, numpy, or multiprocessing in python?

Thank you. I am trying to use less memory.

Upvotes: 0

Views: 393

Answers (1)

Dunes
Dunes

Reputation: 40713

First, is your program using too much memory? If the answer is "no" or "I'm not sure", then ignore this can carry on until you know you really do have a problem.

Using the same buffer for different arrays

You can do all of what you want using "views" that are available within numpy. Views are just different ways of looking at the same data. For instance,

import numpy as np

ints32 = np.array([0, 0, 0, 0], dtype="<i4") # dtype string means little endian 4 byte ints
assert len(ints32) == 4
ints16 = ints32.view(dtype="<i2")
assert len(ints16) == 8 # 32-bit ints need half as much space as a 32-bit int
ints32[0] = 0x11223344
assert ints16[0] == 0x3344
print(ints16) # prints [13124 4386 0 0 0 0 0 0]
# Thus, showing ints16 is backed by the same memory as ints32

You can also use an external buffer if you wish

buffer = bytearray(8)
floats32 = np.frombuffer(buffer, dtype="<f4")
floats32[0] = 1
print(buffer) # shows buffer has been modified

You need to be careful as you may end up with alignment errors:

buf = np.zeros(3, dtype=np.int8) # 3 byte buffer
arr = buf.view(dtype=np.int16) # Error! Needs a buffer with multiples of 2 bytes
two_byte_slice = buf[:2]
arr = two_byte_slice.view(dtype=np.int16) # Succeeds
arr[0] = 1
assert buf[0] == 1 # shows that two_byte_slice and arr are not copies of buf

Sharing the same buffer with different processes, or C libraries

Sharing buffers with C libraries or other processes carries certain risks. This risks are usually mitigated by only copying over the buffer immediately and only using that. However, managed carefully, you can still be safe. For sharing a buffer with a C library, you must make sure:

  • That the C library doesn't hold on to a pointer to the input buffer after the buffer has been released by Python. This is implicitly fine if the C library does not hold on to a reference to the buffer after a function returns, or if you keep a global reference to the owning object.

Sharing the data with another process is more complicated. But can also be made safe.

  • Any spawned process copies data over from the buffer rather than directly using the buffer if it intends to outlive its parent.
  • If two or more processes intend to share a buffer, but work synchronously, then they are well behaved in that a lock is assigned to guard access to buffer and processes observe this lock.

See the following example for sharing a buffer with another process, and using a lock to synchronise access (strictly speaking the lock isn't necessary as the parent waits for the child to complete before continuing).

import numpy as np
import ctypes
from multiprocessing import Array, Process


def main():
    buf = Array(ctypes.c_int8, 10) # 10 byte buffer

    with buf: # acquire lock
        ctypes_arr = buf.get_obj()
        arr = np.frombuffer(ctypes_arr, dtype=np.int16) # int16 array, with size 5
        total = arr.sum()
        del arr, ctypes_arr # losing lock, delete local reference to the buffer

    print("total before:", total) # 0

    p = Process(target=subprocess_target, args=(buf,))
    p.start()
    p.join()

    with buf:
        # interpret first 8 bytes as two 4 byte ints
        view = memoryview(buf.get_obj())[:8]
        arr = np.frombuffer(view, dtype=np.int32)
        total = arr.sum()
        del arr, view

    print("total after:", total) # 262146
    raw_bytes = list(buf.get_obj())
    assert raw_bytes == [0, 0, 1, 0, 2, 0, 3, 0, 4, 0]


def subprocess_target(buf):
    """Sets elements in buf to [0, 1, ..., n-2, n-1]"""
    with buf:
        arr = np.frombuffer(buf.get_obj(), dtype=np.int16)
        arr[:] = range(len(arr))
        del arr


if __name__ == "__main__":
    main()

Upvotes: 1

Related Questions