Reputation: 276
While exploring this collection of "language benchmarks" (I know that these kinds of benchmarks have severe limitations. That is not the point of this question), I noticed that the Fortran implementation of the Fibonacci benchmark runs significantly faster than the C version.
The implementation in C is as follows:
int fibonacci(int n) {
if (n == 0) return 0;
if (n == 1) return 1;
return fibonacci(n - 1) + fibonacci(n - 2);
}
and the Fortran implementation is basically identical except of course that the syntax is different:
recursive function fibonacci(n) result(f)
integer(kind=4), intent(in) :: n
integer(kind=4) :: f
if (n == 0) then
f = 0
elseif (n == 1) then
f = 1
else
f = fibonacci(n - 1) + fibonacci(n - 2)
end if
end function fibonacci
Comparing the assembly produced by gcc v14.2 and gfortran v14.2 -O3
, it appears that gfortran produces assembly that is roughly equivalent to the following C code:
int fibonacci(int n) {
if (n == 0) return 0;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
int a = fibonacci(n - 3);
int b = fibonacci(n - 4);
if (n == 4) return a + 2;
a += b;
int c = fibonacci(n - 5);
b += c;
a += b;
if (n == 5) return b + 1;
return a + b + c + fibonacci(n - 6);
}
(Actually it also eliminates the final tail call and turns it into a loop)
This results in orders of magnitude fewer recursive calls being made. The compiled C version has complexity O(1.6180...^n)
for calculating the nth Fibonacci number, while the compiled Fortran version is O(1.3803...^n)
.
Why is gfortran
able to make this optimization while gcc
is not? (I assume that they have the same or a similar backend) Is there something in the C standard that makes this optimization not be correct in general? Or is it just an oversight? Is there a way to get gcc
to emit similar code?
Upvotes: 5
Views: 85
Reputation: 900
It looks like it's the difference between call-by-value and call-by-reference. The C code that is equivalent to the Fortran code is
int fibonacci(int *n) {
int m;
if (*n == 0)
m = 0;
else if (*n == 1)
m = 1;
else
{
int n1, n2;
n1 = *n - 1;
n2 = *n - 2;
m = fibonacci(&n1) + fibonacci(&n2);
}
return m;
}
If you compile the Fortran code and the above C code with the -fdump-tree-optimized
option, you'll get two files, e.g., a.f90.269t.optimized and a.c.269t.optimized. If you compare the pseudo-code, you'll see something like
<bb 2> [local count: 1073741824]:
_1 = *n_9(D);
if (_1 == 0)
goto <bb 9>; [50.00%]
else
goto <bb 3>; [50.00%]
<bb 3> [local count: 536870912]:
if (_1 == 1)
goto <bb 9>; [51.12%]
else
goto <bb 4>; [48.88%]
<bb 4> [local count: 262422500]:
if (_1 == 2)
goto <bb 8>; [51.12%]
else
goto <bb 5>; [48.88%]
<bb 5> [local count: 131211250]:
_3 = _1 + -2;
n1 = _3;
_24 = _1 + -3;
n2 = _24;
_25 = fibonacci (&n1);
_26 = fibonacci (&n2);
The call-by-value C code has a much different optimized representation. I'm not sure why it seems to be more complicated. Perhaps, post a MWE to the gcc mailing list for an answer who knows the C frontend better than I.
Upvotes: 3