Greg Kennedy
Greg Kennedy

Reputation: 644

GCC instruction scheduling: how do the algorithms differ?

GCC offers a number of options related to instruction scheduling in the compiler flags. An overview of what this means is on the GCC Wiki for "Instruction Scheduling", but this is well out of date (last updated in 2008). I understand the general idea: allow the compiler to re-order instructions to avoid stalls / achieve better CPU resource usage. The details on the knobs to control this are severely lacking and highly technical, though.

Here's the options that (I think) relate to instruction scheduling, from the current GCC Optimization Options:

And, of course, the multiple optimization level (-O2, -O3, -Os etc) also enable some of these by default.

Time for questions.

  1. What is the difference between these? Is one "better" than the other, more costly in compile time, experimental, limited to only certain targets, etc?
  2. What are the limits of enabling these? e.g. can Modulo Scheduling be turned on with the Selective Scheduler? or are they mutually exclusive?
  3. Where can I find more documentation about this? As I've seen the GCC manual and wiki are both light on information, and searches haven't turned up much more about this topic either. Short of reading commit messages to a mailing list, how would a user (me) learn about what these options do?
  4. Anything else? In this "I don't know what I don't know" situation... what should I know? :)

Upvotes: 1

Views: 244

Answers (1)

Hsu Hau
Hsu Hau

Reputation: 622

Instruction scheduling passes in GCC

There are two instruction scheduling passes in GCC. One performs before the register allocation (RA) pass, and the other performs after. Their corresponding flags are -fschedule-insns and -fschedule-insns2.

(Update 2024-11-14)

From the comment in gcc/sched-rgn.cc:

This pass implements list scheduling within basic blocks. It is run twice: (1) after flow analysis, but before register allocation, and (2) after register allocation.

(The reason for doing instruction scheduling twice is that, instruction passes tries to utilize all hardware resource as much as possible. But the optimization might cause register spill when doing RA. If RA is performed before instruction scheduling, there might have too less room for it. This is one of the the phase-ordering problem in compiler. The compilers I know (GCC, LLVM) performs instruction scheduling twice to mitigate the issue.)

Instruction scheduling algorithms

This section tries to answer "What is the difference between these?".

There are different instruction scheduling algorithms:

Each scheduler has parameters to control its behavior, for example -fmodulo-sched-allow-regmoves tries to do aggressive register movings when performing modulo scheduling. To understand all the flags, one might need to understand the algorithms first and trace the source code for those uses if necessary.

Relationship between the algorithms

This section tris to answer "What are the limits of enabling these". But since there are no clear documents, most of the following are guessing :p.

I guess the first 3 scheduler (block, interblock and selective) are mutual exclusive. Since they are doing similar things with different heuristics. Options like -fselective-scheduling, -fselective-scheduling2, -fno-sched-interblock are used to control which scheduler to be used in the first or second instruction scheduling pass. (And those options seem to mutually exclusive with each others.)

(Update 2024-11-14) From the source code in gcc/sched-rgn.cc, these 3 are mutual exclusive:

/* Run second scheduling pass after reload.  */
static unsigned int
rest_of_handle_sched2 (void)
{
#ifdef INSN_SCHEDULING
  if (flag_selective_scheduling2
      && ! maybe_skip_selective_scheduling ())
    run_selective_scheduling ();
  else
    {
      /* Do control and data sched analysis again,
        |and write some more of the results to dump file.  */
      if (flag_sched2_use_superblocks)
        schedule_ebbs ();
      else
        schedule_insns ();
    }
#endif

I guess the the default scheduler for the first pass (before RA) is the interblock scheduler, since there is an option -fno-sched-interblock to not to use interblock scheduler.

I guess the default scheduler for the second pass (after RA) is the block scheduler , since we have -fsched2-use-superblocks option.

The modulo scheduler is targeting inner loops so can be performed along with other schedulers. But this is not enabled by default, have to use -fmodulo-sched to enable it.

Other questions

"Where can I find more documentation about this?"

Sadly I found nothing nigher. InstructionScheduling provides a quick overview but it is out of date. Optimization options provides some hints about the schedulers, but not so clear.

I tried to search the web, read the comments and codes in the GCC source tree, and use git blame to find more informations. Also reading compiler books to have some background knowledge for instruction scheduling. This answer is all I have ...


Appendix: related GCC source files

  • gcc/haifa-sched.cc: contains shared codes between schedulers. (Although InstructionScheduling doc says the block scheduler is implemented in haifa-sched.cc. But I did some archeology and there seems some refactors had been performed, like this commit and this commit.)

  • gcc/sched-rgn.cc: the entry point of instruction scheduling pass.

  • gcc/sched-ebb.cc: codes of superblock (ebb) scheduler.

  • gcc/sel-sched*: code of selective scheduler.

  • gcc/modulo-sched.cc: code of modulo scheduler.

Upvotes: 1

Related Questions