Dylan
Dylan

Reputation: 276

Why is gfortran able to optimize recursive functions by "diving deeper" into the call tree, but gcc is not?

While exploring this collection of "language benchmarks" (I know that these kinds of benchmarks have severe limitations. That is not the point of this question), I noticed that the Fortran implementation of the Fibonacci benchmark runs significantly faster than the C version.

The implementation in C is as follows:

int fibonacci(int n) {
  if (n == 0) return 0;
  if (n == 1) return 1;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

and the Fortran implementation is basically identical except of course that the syntax is different:

recursive function fibonacci(n) result(f)
    integer(kind=4), intent(in) :: n
    integer(kind=4) :: f

    if (n == 0) then
        f = 0
    elseif (n == 1) then
        f = 1
    else
        f = fibonacci(n - 1) + fibonacci(n - 2)
    end if
end function fibonacci

Comparing the assembly produced by gcc v14.2 and gfortran v14.2 -O3, it appears that gfortran produces assembly that is roughly equivalent to the following C code:

int fibonacci(int n) {
  if (n == 0) return 0;
  if (n == 1) return 1;
  if (n == 2) return 1;
  if (n == 3) return 2;
  int a = fibonacci(n - 3);
  int b = fibonacci(n - 4);
  if (n == 4) return a + 2;
  a += b;
  int c = fibonacci(n - 5);
  b += c;
  a += b;
  if (n == 5) return b + 1;
  return a + b + c + fibonacci(n - 6);
}

(Actually it also eliminates the final tail call and turns it into a loop)

This results in orders of magnitude fewer recursive calls being made. The compiled C version has complexity O(1.6180...^n) for calculating the nth Fibonacci number, while the compiled Fortran version is O(1.3803...^n).

Why is gfortran able to make this optimization while gcc is not? (I assume that they have the same or a similar backend) Is there something in the C standard that makes this optimization not be correct in general? Or is it just an oversight? Is there a way to get gcc to emit similar code?

Upvotes: 5

Views: 85

Answers (1)

steve
steve

Reputation: 900

It looks like it's the difference between call-by-value and call-by-reference. The C code that is equivalent to the Fortran code is

int fibonacci(int *n) {
  int m;
  if (*n == 0) 
    m = 0;
  else if (*n == 1)
    m = 1;
  else
    {
      int n1, n2;
      n1 = *n - 1;
      n2 = *n - 2;
      m = fibonacci(&n1) + fibonacci(&n2);
    }

  return m;
}

If you compile the Fortran code and the above C code with the -fdump-tree-optimized option, you'll get two files, e.g., a.f90.269t.optimized and a.c.269t.optimized. If you compare the pseudo-code, you'll see something like

  <bb 2> [local count: 1073741824]:
  _1 = *n_9(D);
  if (_1 == 0)
    goto <bb 9>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870912]:
  if (_1 == 1)
    goto <bb 9>; [51.12%]
  else
    goto <bb 4>; [48.88%]

  <bb 4> [local count: 262422500]:
  if (_1 == 2)
    goto <bb 8>; [51.12%]
  else
    goto <bb 5>; [48.88%]

  <bb 5> [local count: 131211250]:
  _3 = _1 + -2;
  n1 = _3;
  _24 = _1 + -3;
  n2 = _24;
  _25 = fibonacci (&n1);
  _26 = fibonacci (&n2);

The call-by-value C code has a much different optimized representation. I'm not sure why it seems to be more complicated. Perhaps, post a MWE to the gcc mailing list for an answer who knows the C frontend better than I.

Upvotes: 3

Related Questions