Edward Peters
Edward Peters

Reputation: 4062

Does Rust's borrow checker really mean that I should re-structure my program?

So I've read Why can't I store a value and a reference to that value in the same struct? and I understand why my naive approach to this was not working, but I'm still very unclear how to better handle my situation.

I have a program I wanted to structure like follows (details omitted because I can't make this compile anyway):

use std::sync::Mutex;

struct Species{
    index : usize,
    population : Mutex<usize>
}
struct Simulation<'a>{
    species : Vec<Species>,
    grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
    pub fn new() -> Self {...} //I don't think there's any way to implement this
    pub fn run(&self) {...}
}

The idea is that I create a vector of Species (which won't change for the lifetime of Simulation, except in specific mutex-guarded fields) and then a grid representing which species live where, which will change freely. This implementation won't work, at least not any way I've been able to figure out. As I understand it, the issue is that pretty much however I make my new method, the moment it returns, all of the references in grid would becomine invalid as Simulation and therefor Simulation.species are moved to another location in the stack. Even if I could prove to the compiler that species and its contents would continue to exist, they actually won't be in the same place. Right?

I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse. What I'm starting to think is that I need to really re-structure my code to look something like this (details filled in with placeholders because now it actually runs):

use std::sync::Mutex;

struct Species{
    index : usize,
    population : Mutex<usize>
}
struct Simulation<'a>{
    species : &'a Vec<Species>, //Now just holds a reference rather than data
    grid : Vec<&'a Species>
}
impl<'a> Simulation<'a>{
    pub fn new(species : &'a Vec <Species>) -> Self { //has to be given pre-created species
        let grid = vec!(species.first().unwrap(); 10);
        Self{species, grid}
    }
    pub fn run(&self) {
        let mut population = self.grid[0].population.lock().unwrap();
        println!("Population: {}", population);
        *population += 1;
    }
}

pub fn top_level(){
    let species = vec![Species{index: 0, population : Mutex::new(0_)}];
    let simulation = Simulation::new(&species);
    simulation.run();
}

As far as I can tell this runs fine, and ticks off all the ideal boxes:

But, this feels very weird to me: the two-step initialization process of creating owned memory and then references can't be abstracted any way that I can see, which feels like I'm exposing an implementation detail to the calling function. top_level has to also be responsible for establishing any other functions or (scoped) threads to run the simulation, call draw/gui functions, etc. If I need multiple levels of references, I believe I will need to add additional initialization steps to that level.

So, my question is just "Am I doing this right?". While I can't exactly prove this is wrong, I feel like I'm losing a lot of near-universal abstraction of the call structure. Is there really no way to return species and simulation as a pair at the end (with some one-off update to make all references point to the "forever home" of the data).

Phrasing my problem a second way: I do not like that I cannot have a function with a signature of ()-> Simulation, when I can can have a pair of function calls that have that same effect. I want to be able to encapsulate the creation of this simulation. I feel like the fact that this approach cannot do so indicates I'm doing something wrong, and that there may be a more idiomatic approach I'm missing.

Upvotes: 2

Views: 509

Answers (2)

JMAA
JMAA

Reputation: 2059

If you're owning a Vec of objects, then want to also keep track of references to particular objects in that Vec, a usize index is almost always the simplest design. It might feel like extra boilerplate to you now, but it's a hell of a lot better than properly dealing with keeping pointers in check in a self-referential struct (as somebody who's made this mistake in C++ more than I should have, trust me). Rust's rules are saving you from some real headaches, just not ones that are obvious to you necessarily.

If you want to get fancy and feel like a raw usize is too arbitrary, then I recommend you look at slotmap. For a simple SlotMap, internally it's not much more than an array of values, iteration is fast and storage is efficient. But it gives you generational indices (slotmap calls these "keys") to the values: each value is embellished with a "generation" and each index also internally keeps hold of a its generation, therefore you can safely remove and replace items in the Vec without your references suddenly pointing at a different object, it's really cool.

Upvotes: 2

Kevin Reid
Kevin Reid

Reputation: 43753

I've looked into various ways around this, such as making species as an Arc on the heap or using usizes instead of references and implementing my own lookup function into the species vector, but these seem slower, messier or worse.

Don't assume that, test it. I once had a self-referential (using ouroboros) structure much like yours, with a vector of things and a vector of references to them. I tried rewriting it to use indices instead of references, and it was faster.

Rc/Arc is also an option worth trying out — note that there is only an extra cost to the reference counting when an Arc is cloned or dropped. Arc<Species> doesn't cost any more to dereference than &Species, and you can always get an &Species from an Arc<Species>. So the reference counting only matters if and when you're changing which Species is in an element of Grid.

Upvotes: 5

Related Questions