urschrei
urschrei

Reputation: 26859

At what point can I pass the array back to my Rust program in order to free its memory?

I'm having difficulty figuring out at what point I can pass the BNG_FFIArray returned by my Rust program back to it, in order to free the memory that it's allocated.

My ctypes setup is as follows:

class BNG_FFITuple(Structure):
    _fields_ = [("a", c_uint32),
                ("b", c_uint32)]

class BNG_FFIArray(Structure):
    _fields_ = [("data", c_void_p),
                ("len", c_size_t)]

    # Allow implicit conversions from a sequence of 32-bit unsigned
    # integers.
    @classmethod
    def from_param(cls, seq):
        return seq if isinstance(seq, cls) else cls(seq)

    def __init__(self, seq, data_type = c_float):
        array_type = data_type * len(seq)
        raw_seq = array_type(*seq)
        self.data = cast(raw_seq, c_void_p)
        self.len = len(seq)

# A conversion function that cleans up the result value to make it
# nicer to consume.
def bng_void_array_to_tuple_list(array, _func, _args):
    res = cast(array.data, POINTER(BNG_FFITuple * array.len))[0]
    return res

convert_bng = lib.convert_vec_c
convert_bng.argtypes = (BNG_FFIArray, BNG_FFIArray)
convert_bng.restype = BNG_FFIArray
convert_bng.errcheck = bng_void_array_to_tuple_list

# this is the FFI function I'd like to call. It takes a BNG_FFIArray as its argument
drop_array = lib.drop_array 
drop_array.argtypes = (BNG_FFIArray,)


def convertbng(lons, lats):
    """ just a wrapper """
    return [(i.a, i.b) for i in iter(convert_bng(lons, lats))]

# pass values into the FFI rust function
convertbng([-0.32824866], [51.44533267])

This is all working correctly, but I'm not sure at what point I'm supposed to return the data initially allocated by my call to lib.convert_to_bng back across the FFI boundary in order to free its associated memory, by calling drop_array.

Here's my Rust struct and function.

#[repr(C)]
pub struct Array {
    data: *const c_void,
    len: libc::size_t,
}

#[no_mangle]
pub extern "C" fn drop_array(arr: Array) {
    unsafe { Vec::from_raw_parts(arr.data as *mut u8, arr.len, arr.len) };
}

impl Array {
    unsafe fn as_f32_slice(&self) -> &[f32] {
        assert!(!self.data.is_null());
        slice::from_raw_parts(self.data as *const f32, self.len as usize)
    }
    unsafe fn as_i32_slice(&self) -> &[i32] {
        assert!(!self.data.is_null());
        slice::from_raw_parts(self.data as *const i32, self.len as usize)
    }

    fn from_vec<T>(mut vec: Vec<T>) -> Array {
        // Important to make length and capacity match
        // A better solution is to track both length and capacity
        vec.shrink_to_fit();

        let array = Array {
            data: vec.as_ptr() as *const libc::c_void,
            len: vec.len() as libc::size_t,
        };

        // Leak the memory, and now the raw pointer is the owner
        mem::forget(vec);

        array
    }
}


#[no_mangle]
pub extern "C" fn convert_vec_c(lon: Array, lat: Array) -> Array {
    // we're receiving floats
    let lon = unsafe { lon.as_f32_slice() };
    let lat = unsafe { lat.as_f32_slice() };
    // copy values and combine
    let orig = lon.iter()
                  .cloned()
                  .zip(lat.iter()
                          .cloned());
    // carry out the conversion
    let result = orig.map(|elem| convert_bng(elem.0 as f64, elem.1 as f64));
    // convert back to vector of unsigned integer Tuples
    let nvec = result.map(|ints| {
                         IntTuple {
                             a: ints.0 as u32,
                             b: ints.1 as u32,
                         }
                     })
                     .collect();
    Array::from_vec(nvec)
}

Upvotes: 2

Views: 331

Answers (1)

huon
huon

Reputation: 102056

There are two ways to manage resources in Python, both of which involve creating an object that either:

Both of these involve having a manager object that controls/provides access to the resource, which will run any clean-up code necessary when the object is no longer needed. For this case, I think the first one works best, but I'll demonstrate both.

For my examples, I'll use this Rust code,, where Data is a stand-in for any resource that needs managing (e.g. your Array type):

// ffi_example.rs
#![crate_type = "dylib"]

pub struct Data {
    x: i32
}

#[no_mangle]
pub extern fn data_create(x: i32) -> *mut Data {
    println!("Rust: creating: x = {}", x);
    Box::into_raw(Box::new(Data { x: x }))
}

// example function for interacting with the pointer
#[no_mangle]
pub unsafe extern fn data_get(p: *mut Data) -> i32 {
    (*p).x
}

#[no_mangle]
pub unsafe extern fn data_destroy(p: *mut Data) {
    let data = Box::from_raw(p);
    println!("Rust: destroying: x = {}", data.x);
}

Which can be compiled with, say, rustc ffi_example.rs to create libffi_example.so (or similar, depending on platform). This is the start of the Python code I'm using for the two cases (the CDLL call may need to be adjusted):

import sys
import ctypes as c

class RawData(c.Structure):
    pass

lib = c.CDLL('./libffi_example.so')

create = lib.data_create
create.argtypes = [c.c_int]
create.restype = c.POINTER(RawData)

get = lib.data_get
get.arg_types = [c.POINTER(RawData)]
get.restype = c.c_int

destroy = lib.data_destroy
destroy.argtypes = [c.POINTER(RawData)]
destroy.restype = None

(Note that by interfacing via pointers, I don't have to tell Python any info about the internals of RawData.)

You can check everything's working by adding the following, for instance, to the end:

p = create(10)
print('Python: got %s (at 0x%x)' % (get(p), c.addressof(p.contents)))
sys.stdout.flush()
destroy(p)

which prints something like

Rust: creating: x = 10 (at 0x138b7c0)
Python: got 10 (at 0x138b7c0)
Rust: destroying: x = 10 (at 0x138b7c0)

(The flush is to ensure the prints from the two languages appear in the right order, since they have different buffers.)

__del__

To use __del__ just make an Python object (not a ctypes.Structure) that serves as the interface to the Rust one, like

class Data:
    def __init__(self, x):
         self._pointer = create(x)

    def get(self):
         return int(get(self._pointer))

    def __del__(self):
         destroy(self._pointer)

This can then be used as a normal object:

obj = Data(123)
print('Python: %s' % obj.get())
sys.stdout.flush()

obj2 = obj # two pointers to the same `Data`

obj = Data(456) # overwrite one
print('Python: %s, %s' % (obj.get(), obj2.get()))
sys.stdout.flush()

obj2 = None # just clear the second reference
print('Python: end')
sys.stdout.flush()

This will print:

Rust: creating: x = 123 (at 0x28aa510)
Python: 123
Rust: creating: x = 456 (at 0x28aa6e0)
Python: 456, 123
Rust: destroying: x = 123 (at 0x28aa510)
Python: end
Rust: destroying: x = 456 (at 0x28aa6e0)

That is, Python can tell when object definitely no longer have any references (e.g. when the two handles obj and obj2 are both overwritten for 123, or when the program ends, for 456).

Context managers

If the resource is heavily scoped (which it probably isn't, in this case), it may make sense to instead use a context manager, which will allow something like:

print('Python: before')
sys.stdout.flush()

with Data(789) as obj:
    print('Python: %s' % obj.get())
    sys.stdout.flush()
# obj's internals destroyed here

print('Python: after')
sys.stdout.flush()

This is somewhat error-prone because a handle to an object can be kept outside the with statement, so it has to check for this or else it may access deallocated memory. For instance,

with Data(1234) as obj:
    pass
# obj's internals destroyed here

print(obj.get()) # oops...

Anyway, implementation:

class Data:
    def __init__(self, x):
        self._x = x
        self._valid = False
    def __enter__(self):
        self._pointer = create(self._x)
        self._valid = False
        return self
    def __exit__(self):
        assert self._valid
        destroy(self._pointer)
        self._valid = False
        return False

    def get(self):
        if not self._valid:
            raise ValueError('getting from a destroyed Data')
        return int(get(self._pointer))

The first example above gives output like:

Python: before
Rust: creating: x = 789 (at 0x1650530)
Python: 789
Rust: destroying: x = 789 (at 0x1650530)
Python: after

And the second gives:

Rust: creating: x = 1234 (at 0x113d450)
Rust: destroying: x = 1234 (at 0x113d450)
Traceback (most recent call last):
  File "ffi.py", line 82, in <module>
    print(obj.get()) # oops...
  File "ffi.py", line 63, in get
    raise ValueError('getting from a destroyed Data')
ValueError: getting from a destroyed Data

This approach does have the advantage of makes the region of code where the resource is valid/allocated clearer, effectively a manual form of the RAII/scope-based resource management of Rust.

Upvotes: 7

Related Questions