Jacob Brown
Jacob Brown

Reputation: 7561

What is the most efficient way to read a large file in chunks without loading the entire file in memory at once?

What is the most efficient general purpose way of reading "large" files (which may be text or binary), without going into unsafe territory? I was surprised how few relevant results there were when I did a web search for "rust read large file in chunks".

For example, one of my use cases is to calculate an MD5 checksum for a file using rust-crypto (the Md5 module allows you to add &[u8] chunks iteratively).

Here is what I have, which seems to perform slightly better than some other methods like read_to_end:

use std::{
    fs::File,
    io::{self, BufRead, BufReader},
};

fn main() -> io::Result<()> {
    const CAP: usize = 1024 * 128;
    let file = File::open("my.file")?;
    let mut reader = BufReader::with_capacity(CAP, file);

    loop {
        let length = {
            let buffer = reader.fill_buf()?;
            // do stuff with buffer here
            buffer.len()
        };
        if length == 0 {
            break;
        }
        reader.consume(length);
    }

    Ok(())
}

Upvotes: 30

Views: 14135

Answers (3)

Claudio Fsr
Claudio Fsr

Reputation: 454

An example of read a large file in chunks without loading the entire file into memory at once.

See:

rust-cookbook

new_file.txt can be a large file.

use std::{
    fs::File,
    io::{BufReader, Read, Write}, 
    error::Error,
};

const BUFFER_SIZE: usize = 5;

fn main() -> Result<(), Box<dyn Error>> {
    // create new_file.txt
    let path = "new_file.txt";
    let mut output = File::create(path)?;
    writeln!(output, "We will generate this text")?;
    writeln!(output, "Line 2 bla bla bla")?;
    writeln!(output, "Line 3 foo bar for bar")?;

    // read new_file.txt
    let input = File::open(path)?;
    let reader = BufReader::new(input);
    read_file(reader)?;

    Ok(())
}

fn read_file<R: Read>(mut reader: R) -> Result<(), Box<dyn Error>> {
    let mut buffer = [0_u8; BUFFER_SIZE];

    loop {
        let count = reader.read(&mut buffer)?;
        if count == 0 {
            break;
        }
        let string_slice = std::str::from_utf8(&buffer[..count])?;

        // Read BUFFER_SIZE bytes in each loop
        // Change print! to println!, to see the result.
        
        print!("{string_slice}");

        //println!("{string_slice}");
    }

    Ok(())
}

See Rust Playground.

Upvotes: 2

Hossin Azmoud
Hossin Azmoud

Reputation: 1

I did it this way, I don't know if it is wrong but it worked perfectly for me, still don't know if it is the correct way tho..

use std::io;
use std::io::prelude::*;
use std::fs::File;

fn main() -> io::Result<()> 
{
    const FNAME: &str = "LargeFile.txt";
    const CHUNK_SIZE: usize = 1024; // bytes read by every loop iteration.
    let mut limit: usize = (1024 * 1024) * 15; // How much should be actually read from the file..
    let mut f = File::open(FNAME)?;
    let mut buffer = [0; CHUNK_SIZE]; // buffer to contain the bytes.

    // read up to 15mb as the limit suggests..
    loop {
        if limit > 0 {
            // Not finished reading, you can parse or process data.
            let _n = f.read(&mut buffer[..])?;

            for bytes_index in 0..buffer.len() {
               print!("{}", buffer[bytes_index] as char);
            }
            limit -= CHUNK_SIZE;
        } else {
            // Finished reading..
            break;
        }
    }    
    Ok(())
}

Upvotes: 0

Eli Friedman
Eli Friedman

Reputation: 2393

I don't think you can write code more efficient than that. fill_buf on a BufReader over a File is basically just a straight call to read(2).

That said, BufReader isn't really a useful abstraction when you use it like that; it would probably be less awkward to just call file.read(&mut buf) directly.

Upvotes: 14

Related Questions