Reputation: 1683
The following code involves a very subtle bit of borrow checker dodging. The code itself describes the reasoning. The questions:
/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
// Safety: As indicated below, we would like to return val1 directly in the loop,
// but rust will reject this, claiming a double borrow, and we instead use some
// unsafe hacks to circumvent the borrow checker. To show this is safe, consider
// two cases.
// - If the return is exercised (we found an element and early out):
// - let 'b be 'a (the borrow of self),
// - and let 'c be empty
// - Otherwise (we did not find an element, exit the loop and terminate normally):
// - let 'b be the duration of the loop,
// - and let 'c be from the end of the loop until the end of 'a
// In either case, 'b and 'c are disjoint, so the "double borrow" is safe.
// The borrow checker reasons that 'b has to be at least 'a because it is returned,
// and therefore it overlaps with 'c, but these happen in mutually exclusive
// situations.
for (key1, val1) in & /* 'b */ mut *this {
if key == *key1 {
// return val1; // we would like to write this
return unsafe { // safety, see above. We know we are in the first case, so 'b = 'a
std::mem::transmute::<&/* 'b */ mut V, &/* 'a */ mut V>(val1)
}
}
}
let this = & /* 'c */ mut *this;
this.push((key, val));
&mut this.last_mut().unwrap().1
}
This is what I'd prefer to write:
/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
for (key1, val1) in &mut *this {
if key == *key1 {
return val1;
}
}
let this = &mut *this;
this.push((key, val));
&mut this.last_mut().unwrap().1
}
but it fails:
error[E0499]: cannot borrow `*this` as mutable more than once at a time
--> src/lib.rs:8:16
|
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
| -- lifetime `'a` defined here
3 | for (key1, val1) in &mut *this {
| ---------- first mutable borrow occurs here
4 | if key == *key1 {
5 | return val1;
| ---- returning this value requires that `*this` is borrowed for `'a`
...
8 | let this = &mut *this;
| ^^^^^^^^^^ second mutable borrow occurs here
Upvotes: 5
Views: 495
Reputation: 430673
As stated in the related questions, the easiest thing to do is to use an index instead as it requires no unsafe code. I might write it like this:
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
let idx = this
.iter()
.enumerate()
.find_map(|(i, (k, _))| if key == *k { Some(i) } else { None });
let idx = idx.unwrap_or_else(|| {
this.push((key, val));
this.len() - 1
});
&mut this[idx].1
}
You should perform benchmarking to know if this is not fast enough for some reason. Only in that case should you opt in to unsafe
code to get the last bit of speed. You should then benchmark again to see if the code is measurably faster.
For example, you might be able to get the speedup by using slice::get_unchecked_mut
instead of &mut this[idx].1
, which is a much easier bit of unsafe code to rationalize.
The nice thing about using indices in our safe code is that they directly translate into pointer offset logic. We can take this safe example and make minimal modifications to it to get a version using unsafe
code:
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
// I copied this code from Stack Overflow without reading the surrounding
// text which explained why this code is or is not safe.
unsafe {
let found = this
.iter_mut()
.find_map(|(k, v)| if key == *k { Some(v as *mut V) } else { None });
match found {
Some(v) => &mut *v,
None => {
this.push((key, val));
&mut this.last_mut().unwrap().1
}
}
}
}
The main points of safety revolve around the pointer to the value in found
. It started as a mutable reference, so we know that it was valid when we were iterating. We know that find_map
stops iterating on the first Some
, and we know that iterating using iter_mut()
shouldn't change our values anyway. Since this
cannot change between the binding of found
and the usage of it in the match
, I believe that this piece of code is safe.
It's always valuable to exercise your code through Miri. You must actually exercise the code, as Miri only flags code that causes undefined behavior, ignoring any dormant code paths. This code is Miri-clean:
fn main() {
let mut things = vec![(1, 2), (3, 4)];
let v = insert(&mut things, 1, 2);
println!("{} ({:p})", v, v);
let v = insert(&mut things, 1, 2);
println!("{} ({:p})", v, v);
let v = insert(&mut things, 5, 6);
println!("{} ({:p})", v, v);
let v = insert(&mut things, 5, 6);
println!("{} ({:p})", v, v);
}
2 (0x2829c)
2 (0x2829c)
6 (0x41054)
6 (0x41054)
Is [the original implementation] actually safe?
Miri reports no issues for the same test code I used above, and I don't see anything obviously wrong.
Is this the recommended way to express the unsafe operations performed? Should I use pointers instead?
When it's possible to avoid mem::transmute
, it generally should be avoided. transmute
is The Big Hammer and can do quite a lot of things that you might not intend (changing types is a key one). Using pointers feels much simpler in this case.
I agree with the usage of a comment to demonstrate why the unsafe code is safe. Even if it's wrong it still demonstrates the mindset of the original author. A future reviewer may be able to say "ah, they didn't think about checklist item #42, let me test that!".
Specifically for the reasoning in your comment, it's overly dense / academic to me. I don't see why there's talk about multiple lifetimes or double borrows.
Will the new Polonius borrow checker be able to reason about patterns like this?
Yes:
% cargo +nightly rustc --
Compiling example v0.1.0 (/private/tmp/example)
error[E0499]: cannot borrow `*this` as mutable more than once at a time
--> src/main.rs:8:16
|
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
| -- lifetime `'a` defined here
3 | for (key1, val1) in &mut *this {
| ---------- first mutable borrow occurs here
4 | if key == *key1 {
5 | return val1;
| ---- returning this value requires that `*this` is borrowed for `'a`
...
8 | let this = &mut *this;
| ^^^^^^^^^^ second mutable borrow occurs here
% cargo +nightly rustc -- -Zpolonius
Compiling example v0.1.0 (/private/tmp/example)
Finished dev [unoptimized + debuginfo] target(s) in 0.86s
% ./target/debug/example
2 (0x7f97ea405b24)
2 (0x7f97ea405b24)
6 (0x7f97ea405ba4)
6 (0x7f97ea405ba4)
See also:
Upvotes: 4
Reputation: 8944
Firstly, here is what I would suggest instead. You can iterate over the Vec
once to get the index of the target value via position(|x| x == y)
. You are then able to match the now owned value and continue like before. This should have very similar performance to your previous version (In fact, LLVM might even make it identical when built with release mode).
/// Insert a new data element at a given key.
pub fn insert<K: Eq, V>(this: &mut Vec<(K, V)>, key: K, val: V) -> &mut V {
match this.iter().position(|(key1, _)| &key == key1) {
Some(idx) => &mut this[idx].1,
None => {
this.push((key, val));
&mut this.last_mut().unwrap().1
}
}
}
Here is a quick explanation of why the compiler is getting confused. It is easier to view if I first rewrite it to separate the creation of the iterator. I also added a second lifetime to the function signature to make it less restrictive and easier to show the error. To be honest it kind of feels like a mistake on the part of the borrow checker, but I can see how it got there.
use std::slice::IterMut;
// Returns a reference of borrowed value 'a of lifetime 'b. Since this reference // may exist up to the end of 'a, we know that 'b <= 'a.
pub fn insert<'a: 'b, 'b, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'b mut V {
// The problem comes from trying to identify an appropriate lifetime for 'c.
// While iterating, each item also more or less shares the lifetime 'c.
let iterator: IterMut<'c, (K, V)> = this.into_iter();
for (ref mut key1, ref mut val1) in iterator {
if key == *key1 {
// Since this is the returned value, it must have lifetime 'b to match
// the function signature. But at the same time it must also live for 'c.
// Therefore 'b <= 'c.
return val1
}
}
// So at this point the constraints we have so far are as follows:
// 'b <= 'a
// 'c <= 'a
// 'b <= 'c
// Therefore 'b <= 'c <= 'a
// Due to the next line, 'c mandates the iterator is still alive making this the
// second mutable borrow.
this.push((key, val));
// This lives for 'b, but since 'b <= 'c then 'c still exists
&mut this.last_mut().unwrap().1
}
unsafe
? If it uses unsafe
then it is not safe. Safe/unsafe is not about if it should work. Just because C code works, doesn't make it safe. It is about if our code has the potential for human error causing the program to act in ways the compiler can't account for. We only deem something unsafe to be safe once we have tried it under a number of conditions and it reliably works as expected with no exceptions. So "is this actually safe?" is more of question of how much much trust you have in this code.fn foo<'a>(&'a A) -> &'a B
lifetimes. This can be more restrictive because it forces the returned lifetime to be the same as the input. The implicit version looks more like fn foo<'a: 'b, 'b>(&'a A) -> &'b B
and only requires that the input lifetime is longer than the returned lifetime.Upvotes: -1