Reputation: 53
I'm trying to re-write a portion of the GNU coreutils 'split' tool, to split a file in multiple parts of approximately the same size.
A part of my program is reading large portions of a file just to write them into another. On the memory side I don't want to map these portions in memory because they can be anywhere from zero bytes long up to several gigabytes.
Here's an extract of the code I wrote using a BufReader:
let file = File::open("myfile.txt");
let mut buffer = Vec::new();
let mut reader = BufReader::new(&file);
let mut handle = reader.take(length); // here length can be 10 or 1Go !
let read = handle.read_to_end(&mut buffer);
I feel like I'm mapping the whole chunk of file in memory because of the read_to_end(&mut buffer)
call. Am I? If not, does it mean the the BufReader is doing its job and can I just admit that it's doing some kind of magic (abstraction) allowing me to "read" an entire portion of a file without really mapping it into memory? Or am I misusing these concepts in my code?
Upvotes: 0
Views: 2495
Reputation: 3261
Yes, if we look into the source of the read_to_end
function we can see that the buffer you give it will be extended to hold the new data as it comes in if the available space in the vector is exhausted.
And even just in the docs, rust tells us that is read everything until EOF
into the buffer:
Read all bytes until EOF in this source, placing them into buf
You can also take a look at the code presented in this question as a starting point using a BufReader
:
use std::{
fs::File,
io::{self, BufRead, BufReader},
};
fn main() -> io::Result<()> {
const CAP: usize = 1024 * 128;
let file = File::open("my.file")?;
let mut reader = BufReader::with_capacity(CAP, file);
loop {
let length = {
let buffer = reader.fill_buf()?;
// do stuff with buffer here
buffer.len()
};
if length == 0 {
break;
}
reader.consume(length);
}
Ok(())
}
A better approach might be to set up an un-buffered Reader
, and read bytes directly into the buffer while checking that you are not exceeding whatever byte or line bounds specified by the user, and writing the buffer contents to file.
Upvotes: 1
Reputation: 362107
Yes, you're reading the whole chunk into memory. You can inspect buffer
to confirm. If it has length
bytes then there you go; there are length
bytes in memory. There's no way BufReader
could fake that.
Upvotes: 1