Reputation: 7561
What is the most efficient general purpose way of reading "large" files (which may be text or binary), without going into unsafe
territory? I was surprised how few relevant results there were when I did a web search for "rust read large file in chunks".
For example, one of my use cases is to calculate an MD5 checksum for a file using rust-crypto
(the Md5
module allows you to add &[u8]
chunks iteratively).
Here is what I have, which seems to perform slightly better than some other methods like read_to_end
:
use std::{
fs::File,
io::{self, BufRead, BufReader},
};
fn main() -> io::Result<()> {
const CAP: usize = 1024 * 128;
let file = File::open("my.file")?;
let mut reader = BufReader::with_capacity(CAP, file);
loop {
let length = {
let buffer = reader.fill_buf()?;
// do stuff with buffer here
buffer.len()
};
if length == 0 {
break;
}
reader.consume(length);
}
Ok(())
}
Upvotes: 30
Views: 14135
Reputation: 454
An example of read a large file in chunks without loading the entire file into memory at once.
See:
new_file.txt can be a large file.
use std::{
fs::File,
io::{BufReader, Read, Write},
error::Error,
};
const BUFFER_SIZE: usize = 5;
fn main() -> Result<(), Box<dyn Error>> {
// create new_file.txt
let path = "new_file.txt";
let mut output = File::create(path)?;
writeln!(output, "We will generate this text")?;
writeln!(output, "Line 2 bla bla bla")?;
writeln!(output, "Line 3 foo bar for bar")?;
// read new_file.txt
let input = File::open(path)?;
let reader = BufReader::new(input);
read_file(reader)?;
Ok(())
}
fn read_file<R: Read>(mut reader: R) -> Result<(), Box<dyn Error>> {
let mut buffer = [0_u8; BUFFER_SIZE];
loop {
let count = reader.read(&mut buffer)?;
if count == 0 {
break;
}
let string_slice = std::str::from_utf8(&buffer[..count])?;
// Read BUFFER_SIZE bytes in each loop
// Change print! to println!, to see the result.
print!("{string_slice}");
//println!("{string_slice}");
}
Ok(())
}
See Rust Playground.
Upvotes: 2
Reputation: 1
I did it this way, I don't know if it is wrong but it worked perfectly for me, still don't know if it is the correct way tho..
use std::io;
use std::io::prelude::*;
use std::fs::File;
fn main() -> io::Result<()>
{
const FNAME: &str = "LargeFile.txt";
const CHUNK_SIZE: usize = 1024; // bytes read by every loop iteration.
let mut limit: usize = (1024 * 1024) * 15; // How much should be actually read from the file..
let mut f = File::open(FNAME)?;
let mut buffer = [0; CHUNK_SIZE]; // buffer to contain the bytes.
// read up to 15mb as the limit suggests..
loop {
if limit > 0 {
// Not finished reading, you can parse or process data.
let _n = f.read(&mut buffer[..])?;
for bytes_index in 0..buffer.len() {
print!("{}", buffer[bytes_index] as char);
}
limit -= CHUNK_SIZE;
} else {
// Finished reading..
break;
}
}
Ok(())
}
Upvotes: 0
Reputation: 2393
I don't think you can write code more efficient than that. fill_buf
on a BufReader
over a File
is basically just a straight call to read(2)
.
That said, BufReader
isn't really a useful abstraction when you use it like that; it would probably be less awkward to just call file.read(&mut buf)
directly.
Upvotes: 14