Marcel
Marcel

Reputation: 245

What is a faster way to iterate through the bytes of a file in Rust?

I'm new to Rust and I'm trying to come up with a simple backup program. In a first step, the files are broken down into blocks of variable length (via content-defined chunking).

To do this, I have to read the file byte by byte. Unfortunately, I find that the process is terribly slow. With dd I can read at up to 350 MiB / s. Nevertheless, I only get about 45 MiB / s with the following Rust code. (I left out all the chunking stuff there.)

The file I am reading is around 7.7 GiB in size.

// main.rs


use std::fs::File;
use std::io::BufReader;
use std::io::{Read, Bytes};
use std::time::{Instant, Duration};

fn main() {
    let file = File::open("something_big.tar").expect("Cannot read file.");
    
    let mut buf = BufReader::new(file);

    let mut x = 0u8;

    let mut num_bytes = 0usize;

    let t1 = Instant::now();

    for b in buf.bytes() {
        match b {
            Ok(b) => {
                x += b;
                num_bytes += 1;
                // chunking stuff omitted
            },
            Err(_) => panic!("I/O Error")
        }
    }

    let dur = t1.elapsed().as_secs_f64();
    let mut num_bytes = (num_bytes as f64) / 1_048_576f64;

    println!("RESULT: {}", x);

    println!("Read speed: {:.1} MiB / s", num_bytes / dur);
}

Question: What is a better way to quickly iterate through the bytes of a file with Rust? And what is wrong with my code?

I know that maybe I could use the memmap crate or something like this – but: I don't want to do that.

Upvotes: 3

Views: 5032

Answers (1)

sebpuetz
sebpuetz

Reputation: 2618

I'm not sure why this is happening but I'm seeing much faster times when manually read()ing from the BufReader. With the 512 byte array below, I'm seeing ~2700MiB/s, with a single byte array it's around 300 MiB/s.

The Bytes iterator apparently induces some overhead, this implementation is more or less copy pasted from its IntoIterator implementation.

use std::fs::File;
use std::io::{BufReader, ErrorKind};
use std::io::Read;
use std::time::Instant;

fn main() {
    let file = File::open("some-3.3gb-file")
        .expect("Cannot read file.");

    let mut buf = BufReader::new(file);

    let mut x = 0u8;

    let mut num_bytes = 0usize;

    let t1 = Instant::now();

    let mut bytes = [0; 512];
    loop {
        match buf.read(&mut bytes) {
            Ok(0) => break,
            Ok(n) => {
                for i in 0..n {
                    num_bytes += 1;
                    x += bytes[i];
                }
            }
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => panic!("{:?}", e),
        };
    }

    let dur = t1.elapsed().as_secs_f64();
    let mut num_bytes = (num_bytes as f64) / 1_048_576f64;

    println!("RESULT: {}", x);

    println!("Read speed: {:.1} MiB / s for {}", num_bytes / dur, num_bytes);
}

Upvotes: 6

Related Questions