Reputation: 200
I have a Java program that calls out to Rust via JNA, providing the Rust side with a pointer to a potentially large (heap-allocated) buffer of continuously laid out, \0 terminated UTF-8 strings. The memory is owned by the Java side and freed when the garbage collector finalizes the associated object.
My goal is to process that buffer by interpreting it as a string vector, doing what I need to do, and dropping all structures that Rust allocated on top of the buffer, e.g. Vec
's, String
s, etc. Due to the potential size of the buffer, I want to avoid copying the data around, if possible.
Consider the following code:
use std::ffi::CString;
use std::os::raw::c_char;
pub extern "C" fn process_data(data: *const c_char, num_elements: i64) {
let mut vec: Vec<String> = Vec::with_capacity(num_elements as usize);
let mut offset = 0;
unsafe {
for _ in 0..num_elements {
let ptr = { data.offset(offset as isize) };
// Main goal here is to have no memory copy involved
let s = String::from_utf8_unchecked(CString::from_raw(ptr as *mut c_char).into_bytes());
offset += s.len() + 1; // Include string termination
vec.push(s);
}
}
// do stuff with the vector
// ...
// Now that we're done, vec would be dropped, freeing the strings, thus freeing their underlying memory.
}
My understanding is that I now have a Vec
that internally points to a buffer containing String
s that in turn internally point to Vec
s, that then point in some way into the buffer I passed in.
If I let the code run like this without forgetting the vector explicitly, I get a double free because Java tries to deallocate the buffer, but Rust already did so by dropping the vector. Makes sense. However, forgetting the vector leaks all "management" structures on top of the buffer.
I thought about how I could deallocate everything that Rust allocated without leaking any memory. I thought about explicitly leaking boxes and dropping the pointers they give me (because Java still has a pointer) along the lines of:
fn forget_vec(vec: Vec<String>) {
vec.into_iter().map(|s| {
Box::into_raw(s.into_bytes().into_boxed_slice());
}
}
However, since a slice is also a structure that contains a length and a pointer, and by doing the above I think I'd leak this structure. I was looking for something that consumes the slice and only returns me a pointer like *const u8
.
I have a feeling that I'm generally going into the right direction, but I'm missing something major or have too little of an understanding of Rust to make it work outright.
Upvotes: 3
Views: 1377
Reputation: 430574
Reread the documentation for CString
, emphasis mine:
A type representing an owned, C-compatible, nul-terminated string with no nul bytes in the middle.
This type serves the purpose of being able to safely generate a C-compatible string from a Rust byte slice or vector.
You do not own these strings, Java does. Use &str
and CStr
instead:
use std::ffi::CStr;
use std::os::raw::c_char;
pub extern "C" fn process_data(data: *const c_char, num_elements: i64) {
let mut vec: Vec<&str> = Vec::with_capacity(num_elements as usize);
unsafe {
let mut ptr = data;
for _ in 0..num_elements {
let s = CStr::from_ptr(ptr);
ptr = ptr.add(s.to_bytes().len() + 1); // Include string termination
if let Ok(s) = s.to_str() {
vec.push(s);
}
}
}
}
When your Vec
is dropped, it just drops references and nothing is deallocated except the Vec
itself.
Upvotes: 7