nametable
nametable

Reputation: 106

Why is my rust slower than my C - memory manipulation?

I have two implmentations that yield the same Vec in the end, however my C implementation is about 2x as fast as the Rust implementation. Here are my implementations:

extern { 
    fn array_test(buffer: *mut u8, length: u32);
}

fn main() {
    let mut vectest = Vec::<u8>::with_capacity(2 << 20);
    unsafe { // using C function
        vectest.set_len(2 << 20);
        let start = SystemTime::now();
        array_test(vectest.as_mut_ptr(), 2 << 20);
        let end = SystemTime::now();
        println!("C func took {:?}", end.duration_since(start));
    };

    let mut vectest2 = Vec::<u8>::with_capacity(2 << 20);
    unsafe { // pure Rust
        vectest2.set_len(2 << 20);
        let start = SystemTime::now();
        let vectest2_ptr = vectest2.as_mut_ptr();
        for i in (0..2 << 20).step_by(4) {
            *(vectest2_ptr.add(i) as *mut u32) = i as u32;
        }
        let end = SystemTime::now();
        println!("Rust took {:?}", end.duration_since(start));
    };
}

C function:

void array_test(char *buffer, unsigned int len) {
    for (unsigned int i = 0; i < len; i += 4) {
        (*(int *)(buffer+i)) = i;
    }
}

One instance of time results from running cargo run --release is:

C func took Ok(440.692µs)
Rust took Ok(921.658µs)

About every time I run it the C finishes twice as fast.

Is this just due to poor Rust code code optimization on my part? If so, what could I do to close the gap between my C and my Rust?

Upvotes: 2

Views: 802

Answers (1)

zgerd
zgerd

Reputation: 1068

If you re-write your loop:

for i in (0..2 << 20).step_by(4) {
   *(vectest2_ptr.add(i) as *mut u32) = i as u32;
}

as

let mut i = 0;
while i <= (2 << 20) {
   *(vectest2_ptr.add(i) as *mut u32) = i as u32;
    i += 4;
}

the performance difference you're seeing goes away.

From a quick glance at the assembly on rust.godbolt.org (using -O as argument for rustc) it looks like rustc/llvm is able to vectorize (with SIMD) the loop when using the while variant, but it isn't applying that same optimization to your for loop.

Upvotes: 5

Related Questions