Reputation:
These two loops are supposed to be equivalent in C++ and Rust:
#include <cstdint>
std::uint64_t sum1(std::uint64_t n) {
std::uint64_t sum = 0;
for (std::uint64_t j = 0; j <= n; ++j) {
sum += 1;
}
return sum;
}
pub fn sum1(num: u64) -> u64 {
let mut sum: u64 = 0;
for j in 0u64..=num {
sum += 1;
}
return sum;
}
However the C++ version generates a very terse assembly:
sum1(unsigned long): # @sum1(unsigned long)
xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
add rax, 1
cmp rax, rdi
jbe .LBB0_1
ret
while Rust's version is very long with two checks in the loop instead of one:
example::sum1:
xor eax, eax
xor ecx, ecx
.LBB0_1:
mov rdx, rcx
cmp rcx, rdi
adc rcx, 0
add rax, 1
cmp rdx, rdi
jae .LBB0_3
cmp rcx, rdi
jbe .LBB0_1
.LBB0_3:
ret
Godbolt: https://godbolt.org/z/xYW94qxjK
What is Rust intrinsically trying to prevent that C++ is carefree about?
Upvotes: 53
Views: 4960
Reputation: 299910
@user3840170 correctly identified the difference: overflow check in Rust, and not in C++.
Still, the question remains as to why there are 2 checks per loop in the Rust version instead of 1, and the answer to that is that LLVM is not sufficiently smart and/or the RangeInclusive
design is not well adapted to LLVM1.
The optimal code generation for short loops, is to split the loop, transforming:
for j in 0u64..=num {
sum += 1;
}
Into:
for j in 0u64..num { // equivalent to for (auto j = 0; j < num; ++j)
sum += 1;
}
if 0 <= num {
sum += 1;
}
This would lead to having a single branch in the main loop, and allow LLVM to switch this to a closed formula2.
The Loop Splitting optimization applies to RangeInclusive
and most other Chain
iterators, as indeed a RangeInclusive
can be thought of as a chain of a once iterator and half-exclusive range iterator (in either order). It is not always a win: like inlining, it implies duplicating the code of the loop body, which if large may lead to a significant overhead in code size.
Unfortunately, LLVM fails to split the loop. I am not sure if it's because the optimization is missing altogether, or it just fails to apply it here for some reason. I'm looking forward to the rustc_codegen_gcc
backend, as GCC 7 added Loop Splitting to GCC, and it may be able to generate better code there.
1 See this comment I left over performance issues with RangeInclusive
. I spent significant time banging my head over the issue in 2019, and I dearly wish RangeInclusive
(and all ranges) were NOT Iterator
; it's a costly mistake.
2 Chain iterators, in general, perform much better using .for_each(...)
, specifically because there the loop is (manually) split. See the playground for (0..=num).for_each(|_| sum += 1)
being reduced to num + 1
.
Upvotes: 17
Reputation: 27219
Overflow in the iterator state.
The C++ version will loop forever when given a large enough input:
#include <cstdint>
std::uint64_t sum1(std::uint64_t n) {
std::uint64_t sum = 0;
for (std::uint64_t j = 0; j <= n; ++j) {
__asm__ ("");
sum += 1;
}
return sum;
}
#include <iostream>
int main() {
std::cout << sum1(UINT64_C(0xffffffff'ffffffff)) << std::endl;
return 0;
}
This is because when the loop counter j
finally reaches 0xffffffff'ffffffff
, incrementing it wraps around to 0, which means the loop invariant j <= n
is always fulfilled and the loop never exits.
Strictly speaking, invoking the original version of sum1
with 0xffffffff'ffffffff
infamously triggers undefined behaviour, though not because of overflow alone, but since infinite loops without externally-visible side effects are UB ([intro.progress]/1). This is why in my version I added an empty __asm__
statement to the function to act as a dummy ‘side effect’ preventing the compiler from taking ‘advantage’ of that in optimisation passes.
The Rust version, on the other hand, is not only perfectly well-defined, but iterates exactly as many times as the cardinality of the range:
use std::num::Wrapping;
fn sum1(num: u64) -> u64 {
let mut sum = Wrapping(0);
for _ in 0..=num {
sum += Wrapping(1);
}
return sum.0;
}
fn main() {
println!("{}", sum1(0xffffffff_ffffffff));
}
The above program (slightly modified to avoid getting bogged down in debug versus release mode differences with respect to the summation) will terminate after exactly 18 446 744 073 709 551 616 iterations and print 0; the Rust version carefully maintains iterator state so that overflow does not happen in the iterator.
Upvotes: 46
Reputation: 26727
These two loops are equivalent in C++ and Rust
Your two code snippets don't share the same behavior. for (uint64_t j = 0; j <= n; ++j)
doesn't handle n == uint64_t::MAX
(make it infinite looping) while for j in 0u64..=num
do (will never go into an infinite loop).
A rust equivalent code could be:
pub fn sum1(num: u64) -> u64 {
let mut sum: u64 = 0;
let mut j = 0;
while j <= num {
sum = sum.wrapping_add(1);
j = j.wrapping_add(1);
}
sum
}
currently produce the following asm godbolt:
example::sum1:
xor eax, eax
.LBB0_1:
add rax, 1
cmp rax, rdi
jbe .LBB0_1
ret
Upvotes: 18