Krish
Krish

Reputation: 1081

Why must pointers used by `offset_from` be derived from a pointer to the same object?

From the standard library:

Both pointers must be derived from a pointer to the same object. (See below for an example.)

let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
// Make ptr2_other an "alias" of ptr2, but derived from ptr1.
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
// Since ptr2_other and ptr2 are derived from pointers to different 
// objects, computing their offset is undefined behavior, even though
// they point to the same address!
unsafe {
    let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
}

I do not understand why this must be the case.

Upvotes: 6

Views: 393

Answers (1)

kmdreko
kmdreko

Reputation: 60447

This has to do with a concept called "provenance" meaning "the place of origin". The Rust Unsafe Code Guidelines has a section on Pointer Provenance. Its a pretty abstract rule but it explains that its an extra bit of information that is used during compilation that helps guide what pointer transformations are well defined.

// Let's assume the two allocations here have base addresses 0x100 and 0x200.
// We write pointer provenance as `@N` where `N` is some kind of ID uniquely
// identifying the allocation.
let raw1 = Box::into_raw(Box::new(13u8));
let raw2 = Box::into_raw(Box::new(42u8));
let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize);
// These pointers now have the following values:
// raw1 points to address 0x100 and has provenance @1.
// raw2 points to address 0x200 and has provenance @2.
// raw2_wrong points to address 0x200 and has provenance @1.
// In other words, raw2 and raw2_wrong have same *address*...
assert_eq!(raw2 as usize, raw2_wrong as usize);
// ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*:
// it points to address 0x200, which is in allocation @2, but the pointer
// has provenance @1.

The guidelines link to a good article: Pointers Are Complicated and its follow up Pointers Are Complicated II that go into more detail and coined the phrase:

Just because two pointers point to the same address, does not mean they are equal and can be used interchangeably.

Essentially, it is invalid to read a value via a pointer that is outside that pointer's original "allocation" even if you can guarantee a valid object exists there. Allowing such behavior could wreak havoc on the language's aliasing rules and possible optimizations. And there's pretty much never a good reason to do it.

This concept is mostly inherited from C and C++.


If you're curious if you've written code that violates this rule. Running it through miri, the undefined behavior analysis tool, can often find it.

fn main() {
    let ptr1 = Box::into_raw(Box::new(0u8));
    let ptr2 = Box::into_raw(Box::new(1u8));
    let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
    let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
    assert_eq!(ptr2 as usize, ptr2_other as usize);
    unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
}
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
 --> src/main.rs:7:49
  |
7 |     unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
  |                                                 ^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
  |
  = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
  = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

Upvotes: 6

Related Questions