Why is my rust slower than my C - memory manipulation?

Question

I have two implmentations that yield the same Vec in the end, however my C implementation is about 2x as fast as the Rust implementation. Here are my implementations:

extern { 
    fn array_test(buffer: *mut u8, length: u32);
}

fn main() {
    let mut vectest = Vec::::with_capacity(2 << 20);
    unsafe { // using C function
        vectest.set_len(2 << 20);
        let start = SystemTime::now();
        array_test(vectest.as_mut_ptr(), 2 << 20);
        let end = SystemTime::now();
        println!("C func took {:?}", end.duration_since(start));
    };

    let mut vectest2 = Vec::::with_capacity(2 << 20);
    unsafe { // pure Rust
        vectest2.set_len(2 << 20);
        let start = SystemTime::now();
        let vectest2_ptr = vectest2.as_mut_ptr();
        for i in (0..2 << 20).step_by(4) {
            *(vectest2_ptr.add(i) as *mut u32) = i as u32;
        }
        let end = SystemTime::now();
        println!("Rust took {:?}", end.duration_since(start));
    };
}

C function:

void array_test(char *buffer, unsigned int len) {
    for (unsigned int i = 0; i < len; i += 4) {
        (*(int *)(buffer+i)) = i;
    }
}

One instance of time results from running cargo run --release is:

C func took Ok(440.692µs)
Rust took Ok(921.658µs)

About every time I run it the C finishes twice as fast.

Is this just due to poor Rust code code optimization on my part? If so, what could I do to close the gap between my C and my Rust?

zgerd · Accepted Answer

If you re-write your loop:

for i in (0..2 << 20).step_by(4) {
   *(vectest2_ptr.add(i) as *mut u32) = i as u32;
}

as

let mut i = 0;
while i <= (2 << 20) {
   *(vectest2_ptr.add(i) as *mut u32) = i as u32;
    i += 4;
}

the performance difference you're seeing goes away.

From a quick glance at the assembly on rust.godbolt.org (using -O as argument for rustc) it looks like rustc/llvm is able to vectorize (with SIMD) the loop when using the while variant, but it isn't applying that same optimization to your for loop.

Why is my rust slower than my C - memory manipulation?

Answers (1)

Related Questions