Reputation: 1491
In the C programming language, it's easy to have tail recursion:
int foo(...) {
return foo(...);
}
Just return as is the return value of the recursive call. It is especially important when this recursion may repeat a thousand or even a million times. It would use a lot of memory on the stack.
Now, I have a Rust function that might recursively call itself a million times:
fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
match input.read(&mut [0u8]) {
Ok ( 0) => Ok(()),
Ok ( _) => read_all(input),
Err(err) => Err(err),
}
}
(this is a minimal example, the real one is more complex, but it captures the main idea)
Here, the return value of the recursive call is returned as is, but:
Does it guarantee that the Rust compiler will apply a tail recursion?
For instance, if we declare some variable that needs to be destroyed like a std::Vec
, will it be destroyed just before the recursive call (which allows for tail recursion) or after the recursive call returns (which forbids the tail recursion)?
Upvotes: 67
Views: 29489
Reputation: 2393
There is a tailcall crate that you can use today and that adds a #[tailcall]
annotation which forces tailcall optimized code generation.
There is also work to standardize explicit tailcalls here and here.
Upvotes: 11
Reputation: 27885
Shepmaster's answer explains that tail call elimination is merely an optimization, not a guarantee, in Rust. But "never guaranteed" doesn't mean "never happens". Let's take a look at what the compiler does with some real code.
As of right now, the latest release of Rust available on Compiler Explorer is 1.39, and it does not eliminate the tail call in read_all
.
example::read_all:
push r15
push r14
push rbx
sub rsp, 32
mov r14, rdx
mov r15, rsi
mov rbx, rdi
mov byte ptr [rsp + 7], 0
lea rdi, [rsp + 8]
lea rdx, [rsp + 7]
mov ecx, 1
call qword ptr [r14 + 24]
cmp qword ptr [rsp + 8], 1
jne .LBB3_1
movups xmm0, xmmword ptr [rsp + 16]
movups xmmword ptr [rbx], xmm0
jmp .LBB3_3
.LBB3_1:
cmp qword ptr [rsp + 16], 0
je .LBB3_2
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call qword ptr [rip + example::read_all@GOTPCREL]
jmp .LBB3_3
.LBB3_2:
mov byte ptr [rbx], 3
.LBB3_3:
mov rax, rbx
add rsp, 32
pop rbx
pop r14
pop r15
ret
mov rbx, rax
lea rdi, [rsp + 8]
call core::ptr::real_drop_in_place
mov rdi, rbx
call _Unwind_Resume@PLT
ud2
Notice this line: call qword ptr [rip + example::read_all@GOTPCREL]
. That's the (tail) recursive call. As you can tell from its existence, it was not eliminated.
Compare this to an equivalent function with an explicit loop
:
pub fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
loop {
match input.read(&mut [0u8]) {
Ok ( 0) => return Ok(()),
Ok ( _) => continue,
Err(err) => return Err(err),
}
}
}
which has no tail call to eliminate, and therefore compiles to a function with only one call
in it (to the computed address of input.read
).
Oh well. Maybe Rust isn't as good as C. Or is it?
Here's a tail-recursive function in C that performs a very similar task:
int read_all(FILE *input) {
char buf[] = {0, 0};
if (!fgets(buf, sizeof buf, input))
return feof(input);
return read_all(input);
}
This should be super easy for the compiler to eliminate. The recursive call is right at the bottom of the function and C doesn't have to worry about running destructors. But nevertheless, there's that recursive tail call, annoyingly not eliminated:
call read_all
Tail call optimization is not guaranteed to happen in C, either. No compiler I tried would be convinced to turn this into a loop on its own initiative.
Since version 13, clang supports a non-standard musttail
attribute you can add to tail calls that should be eliminated. Adding this attribute to the C code successfully eliminates the tail call. However, rustc currently has no equivalent attribute (although the become
keyword is reserved for this purpose).
Okay, so it's not guaranteed. Can the compiler do it at all? Yes! Here's a function that computes Fibonacci numbers via a tail-recursive inner function:
pub fn fibonacci(n: u64) -> u64 {
fn f(n: u64, a: u64, b: u64) -> u64 {
match n {
0 => a,
_ => f(n - 1, a + b, a),
}
}
f(n, 0, 1)
}
Not only is the tail call eliminated, the whole fibonacci_lr
function is inlined into fibonacci
, yielding only 12 instructions (and not a call
in sight):
example::fibonacci:
push 1
pop rdx
xor ecx, ecx
.LBB0_1:
mov rax, rdx
test rdi, rdi
je .LBB0_3
dec rdi
add rcx, rax
mov rdx, rcx
mov rcx, rax
jmp .LBB0_1
.LBB0_3:
ret
If you compare this to an equivalent while
loop, the compiler generates almost the same assembly.
You probably shouldn't be relying on optimizations to eliminate tail calls, either in Rust or in C. It's nice when it happens, but if you need to be sure that a function compiles into a tight loop, the surest way, at least for now, is to use a loop.
Upvotes: 80
Reputation: 430378
Neither tail recursion (reusing a stack frame for a tail call to the same function) nor tail call optimization (reusing the stack frame for a tail call to any function) are ever guaranteed by Rust, although the optimizer may choose to perform them.
if we declare some variable that needs to be destroyed
It's my understanding that this is one of the sticking points, as changing the location of destroyed stack variables would be contentious.
See also:
Upvotes: 51