Why do Read::read and Read::read_exact require that the buffers passed to them be initialized?

Question

I have a reader that contains info about a 51*51 grid, where each spot on the grid is represented by an f32. I want to read this data into a vector so that I can easily deal with it:

pub fn from_reader(reader: &mut R) -> Arena {
    let arena_size = 51 * 51;
    let arena_byte_size = arena_size * size_of::();
    let mut arena = vec![0.0f32; arena_size];

    unsafe {
        let mut arena_slice =
            std::slice::from_raw_parts_mut(arena.as_mut_ptr() as *mut u8, arena_byte_size);
        let _ = reader.read(&mut arena_slice);
    };
    //...
}

This method is inconvenient and unnecessarily slow as it forces the vector to be initialized with 0 values for all its elements. I originally wanted to simply allocate a buffer, not initialize it, read the data into it then use from_raw_parts to create a vector out of it. However I was informed that this is undefined behavior since for some unfathomable reason, read and read_exact require the caller to initialize the data being passed to them before calling either of them.

Why is this the case? Is there any workaround? Are there any solutions being worked on by the Rust team?

Shepmaster · Accepted Answer

Why is this the case?

Because it's valid for an implementer of Read to read the passed-in buffer first. If you passed in uninitialized data and the implementer of Read looked at the buffer, then there would be undefined behavior in purely safe code. Disallowing that, statically, is a large selling point of Rust.

use std::io::{self, Read};

struct Dummy;

impl Read for Dummy {
    fn read(&mut self, buffer: &mut [u8]) -> io::Result {
        let v: u8 = buffer.iter().sum(); // Reading from the buffer
        buffer[0] = v;
        Ok(1)
    }
}

fn main() {
    let mut data = [0, 1, 2];
    Dummy.read(&mut data).unwrap();
    println!("{:?}", data);
}

Why does Read::read not prevent reading from the buffer?

There isn't a language construct that can be used to impose that restriction. Unlike some other languages, Rust doesn't have "out parameters". Even if it did, I could see an implementer of Read wanting the ability to read the data that it just wrote. For example, a reader that counted the number of newlines that passed though it.
Why does Read::read not accept MaybeUninit?

MaybeUninit didn't exist in Rust 1.0 — it was only stabilized in Rust 1.36. We wanted the ability to read from files in Rust 1.0. Due to Rust's backwards-compatiblity guarantees, the method's signature cannot be changed now.
Why is Read::read not unsafe?

This would have been the main (only?) technique to support uninitialized data, but it would have come at a high cost. unsafe isn't a tool that experienced Rust programmers choose trivially. When we do use it, we generally strive really hard to minimize its scope.

If Read::read were unsafe, then every implementer would have to think about how to properly meet the unsafe criteria. This is a high burden to place on "simple" adapters.

Is there any workaround? Are there any solutions being worked on by the Rust team?

The unstable Read::initializer method is one proposed solution, but it's likely not the preferred route.

RFC 2930 provides an updated attempt, and discusses much of the backstory and challenges.

Why do Read::read and Read::read_exact require that the buffers passed to them be initialized?

Answers (1)

Related Questions