Reputation: 106
I have two implmentations that yield the same Vec
in the end, however my C implementation is about 2x as fast as the Rust implementation. Here are my implementations:
extern {
fn array_test(buffer: *mut u8, length: u32);
}
fn main() {
let mut vectest = Vec::<u8>::with_capacity(2 << 20);
unsafe { // using C function
vectest.set_len(2 << 20);
let start = SystemTime::now();
array_test(vectest.as_mut_ptr(), 2 << 20);
let end = SystemTime::now();
println!("C func took {:?}", end.duration_since(start));
};
let mut vectest2 = Vec::<u8>::with_capacity(2 << 20);
unsafe { // pure Rust
vectest2.set_len(2 << 20);
let start = SystemTime::now();
let vectest2_ptr = vectest2.as_mut_ptr();
for i in (0..2 << 20).step_by(4) {
*(vectest2_ptr.add(i) as *mut u32) = i as u32;
}
let end = SystemTime::now();
println!("Rust took {:?}", end.duration_since(start));
};
}
C function:
void array_test(char *buffer, unsigned int len) {
for (unsigned int i = 0; i < len; i += 4) {
(*(int *)(buffer+i)) = i;
}
}
One instance of time results from running cargo run --release
is:
C func took Ok(440.692µs)
Rust took Ok(921.658µs)
About every time I run it the C finishes twice as fast.
Is this just due to poor Rust code code optimization on my part? If so, what could I do to close the gap between my C and my Rust?
Upvotes: 2
Views: 802
Reputation: 1068
If you re-write your loop:
for i in (0..2 << 20).step_by(4) {
*(vectest2_ptr.add(i) as *mut u32) = i as u32;
}
as
let mut i = 0;
while i <= (2 << 20) {
*(vectest2_ptr.add(i) as *mut u32) = i as u32;
i += 4;
}
the performance difference you're seeing goes away.
From a quick glance at the assembly on rust.godbolt.org (using -O as argument for rustc) it looks like rustc/llvm is able to vectorize (with SIMD) the loop when using the while
variant, but it isn't applying that same optimization to your for loop.
Upvotes: 5