Arnaud Paran
Arnaud Paran

Reputation: 31

Understanding how GHC-compiled code works at the lowest level

When using a language, I find it useful to understand exactly how my code is compiled. For example, I like using C because I can inspect the disassembled code in a debugger like GDB and quickly see how the C code I write actually behaves.

The Haskell Wiki has some debugging advice, but I haven't found a way to see the disassembled code generated by GHC. Looking at it with GDB isn't very helpful, since I can't figure out how to find the code for a particular function, figure out where a particular variable is stored, set breakpoints or anything else that I can do with a C program.

My question: How can I disassemble code generated by GHC in a way that's useful? Is there a tool to produce an annotated listing of the code generated by GHC? Is there a way to compile functions in a way that you can see how the compiled code "ticks"?

Some additional background and references:

One reason I want to do this is I think this kind of understanding would be extremely important for getting good performance (e.g., C vs Haskell Collatz conjecture speed comparison). The Haskell wiki entry on performance doesn't go into enough detail to cover the answer to that SO question (namely, optimization of n % 2 and n / 2 by C compilers versus lack of optimization of quot n 2 and rem n 2 in GHC). If I could see the disassembled code, I could figure it out for myself.

Some resources I've found included:

These resources are really useful, but to fully understand them, I feel like I need to see actual compiled GHC code in action.

Upvotes: 2

Views: 820

Answers (1)

K. A. Buhr
K. A. Buhr

Reputation: 50864

Running GHC Programs under the Gnu Debugger

The GHC compiler provides some support for debugging programs under GDB. If you have a program:

-- Quot.hs

{-# NOINLINE quot2 #-}
quot2 :: Int -> Int
quot2 n = n `quot` 2

main = print $ quot2 10

you can compile it with the -g flag and run it under GDB using the instructions in the above link:

$ ghc -O2 -g -rtsopts Quot.hs    # Note: I'm using GHC 8.6.5
$ gdb --args ./Quot +RTS -V0

You can set a breakpoint at the line quot2 n = n `quot` 2, disassemble, and step through the relevant code:

(gdb) break Quot.hs:5
Breakpoint 1 at 0x4062c0: file Quot.hs, line 5.
(gdb) run
Starting program: /u/buhr/src/haskell/Quot +RTS -V0
...
Breakpoint 1, Main_zdwquot2_info () at Quot.hs:5
(gdb) display/10i $rip
1: x/10i $rip
=> 0x4062c0 <Main_zdwquot2_info>:   mov    %r14,%rax
   0x4062c3 <Main_zdwquot2_info+3>: shr    $0x3f,%rax
   0x4062c7 <Main_zdwquot2_info+7>: mov    %r14,%rbx
   0x4062ca <Main_zdwquot2_info+10>:    add    %rax,%rbx
   0x4062cd <Main_zdwquot2_info+13>:    sar    %rbx
   0x4062d0 <Main_zdwquot2_info+16>:    jmpq   *0x0(%rbp)
   ...
(gdb) p $r14
$1 = 10
(gdb)

(Here the Main_zdwquot2_info symbol is a mangled name. See below for some info on this.) Anyway, from this disassembly you can can see that the optimized quot2 code uses a couple of right shifts to perform the equivalent of:

((n < 0 : 1 ? 0) + n) >> 1

to get truncation towards zero right for negative numbers.

If div had been used instead, the code would have looked like:

1: x/10i $rip
=> 0x4062c0 <Main_zdwdiv2_info>:    mov    %r14,%rbx
   0x4062c3 <Main_zdwdiv2_info+3>:  sar    %rbx
   0x4062c6 <Main_zdwdiv2_info+6>:  jmpq   *0x0(%rbp)
   ...

So, even though I would have said you can't learn anything from disassembling GHC code, I guess I was wrong. We've already learned:

  • This answer is incorrect, or at least obsolete as of GHC 8.6.5, since `quot` 2 and `div` 2 are optimized to shifts.
  • The frequently repeated advice to use quot in preference to div for maximum efficiency doesn't apply if you're dividing by 2, since the code generated by `div` 2 is more efficient.

Dumping GHC Intermediate Forms

However, it's still really hard to trace Haskell code execution using GDB. If you had set a breakpoint at the main entry point (which is typically called Main_main_info):

(gdb) break Main_main_info
Breakpoint 2 at 0x4063c8: file Div.hs, line 7.
(gdb) run
...
Breakpoint 2, Main_main_info () at Div.hs:7
1: x/10i $rip
=> 0x4063c8 <Main_main_info>:   mov    $0x4ac622,%edi
   0x4063cd <Main_main_info+5>: mov    $0x4a4348,%esi
   0x4063d2 <Main_main_info+10>:    mov    $0x4a73b8,%r14d
   0x4063d8 <Main_main_info+16>:    jmpq   0x433de8 <base_GHCziIOziHandleziText_hPutStrzq_info>
...

it would be easy to get lost. Here, $0x4ac622 is the closure for the Haskell constant True, $0x4a4348 is the closure for the printable string representation of the result of quot2 10, and $0x4a73b8 is the closure for stdout, all being set up for a GHC.IO.Handle.Text.hPutStr' call.

In GHC code, functions are invoked with jmpq, not callq, so you can't really (gdb) next over calls you aren't interested in tracing. Moreover, because of the lazy evaluation model, arguments aren't evaluated before functions are called, so to trace the execution of the quot2 10 call from scratch, you'd need to trace the execution of hPutStr' to find the point where it forces the evaluation of the closure for the printable representation of the result, and then trace the evaluation of that closure to see where it forces the evaluation of quot2 10 itself.

The other issue is that the assembly is really low level compared to the original Haskell code. Though you might be able to learn something about "micro-optimizations", like how `div` 2 is implemented, most Haskell performance problems are caused by higher-level inefficiencies (e.g., unnecessary boxing of values, failure of list fusion, etc.). It would be incredibly difficult to diagnose these problems from a GDB session.

For that reason, most people seeking to understand GHC-compiled code look at various forms of intermediate compiler output. The GHC compiler, as you've probably discovered from your research, performs a number of transformations through intermediate forms. These can be dumped using a series of -ddump-xxx flags. Often, there's a lot of unnecessary detail, so adding the flag -dsuppress-all and sometimes also the flag -dsuppress-uniques can help to keep the noise down. Also, -fforce-recomp is helpful when running ghc multiple times to look at different forms, since it forces a recompilation (and dumping of desired output), even if the source hasn't changed. So, you might dump the output of the desugaring pass using:

ghc -O2 -fforce-recomp -ddump-ds -dsuppress-all -dsuppress-uniques Quot.hs

Anyway, here are the intermediate forms of most interest:

  • The original source code is "desugared" to Core (-ddump-ds), a simplified version of the Haskell language that is still recognizably Haskell. No optimizations have been performed at this point, so this dump is most useful for just getting to know the Core syntax and how it differs from "real" Haskell. Some of the best documentation of Core is in the GHC source tree -- see ghc/compiler/coreSyn/CoreSyn.hs and the definition of the Expr type.
  • Most high-level optimizations are performed on Core by applying Core-to-Core transformation functions. You can learn a lot about a program's likely performance by studying the Core output produced by the simplifier (-ddump-simpl).
  • The simplified Core is transformed into STG (-ddump-stg). Unlike Core, STG is recognizably "not Haskell". It's an untyped functional language that best represents the way a GHC program "actually ticks". With the STG in front of you as a roadmap/reference, you could probably work through lower-level representations, include stepping through the disassembled code with a debugger, and understand what it's actually doing. The best reference I've found for understanding STG is the paper How to make a fast curry. In the paper, STG is basically the language described in Figure 1, and the assembly code generated corresponds to the "evaluation rules" in Figure 2. You might also find I know kung fu: learning STG by example helpful, though I think the examples there are outdated.
  • Below STG, the lower-level forms are CMM (-ddump-cmm) and the assembly (-ddump-asm). CMM is a C-like assembly language. I haven't found any particuarly good/accurate documentation on it, though it's often easier to understand than the assembly, which in turn is usually easier to follow than the disassembly under GDB, because it includes additional symbols that are missing from the debugging info. I was able to identify the closures for True, etc. in the example above by comparing the assembly dumped by GHC with what I was seeing under GDB.

I find I usually have to look at the STG, CMM, and assembly all side-by-side to figure out how some GHC code is working.

So, maybe that'll get you started.

Name Mangling

As noted above, the names in the executable that you'll see in GDB have been mangled from the forms in the dumped compiler output (including -ddump-asm). I don't know if this is documented anywhere, but here's a table I got from the GHC source code (ghc/compiler/utils/Encoding.hs):

'(' ="ZL"   'Z' ="ZZ"  '$' ="zd"  '<' ="zl"  '\\'="zr"
')' ="ZR"   'z' ="zz"  '=' ="ze"  '-' ="zm"  '/' ="zs"
'[' ="ZM"   '&' ="za"  '>' ="zg"  '!' ="zn"  '*' ="zt"
']' ="ZN"   '|' ="zb"  '#' ="zh"  '+' ="zp"  '_' ="zu"
':' ="ZC"   '^' ="zc"  '.' ="zi"  '\''="zq"  '%' ="zv"

so the name Main_zdwquot2_info in GDB correspond to Main_$wquot2_info in dumped GHC output. That still looks mangled, but that's because it's a "system name". As explained in comments in ghc/compiler/basicTypes/OccName:

Making System Names
-------------------

Here's our convention for splitting up the interface file name space:

   d...         dictionary identifiers
                (local variables, so no name-clash worries)

All of these other OccNames contain a mixture of alphabetic
and symbolic characters, and hence cannot possibly clash with
a user-written type or function name

   $f...        Dict-fun identifiers (from inst decls)
   $dmop        Default method for 'op'
   $pnC         n'th superclass selector for class C
   $wf          Worker for function 'f'
   $sf..        Specialised version of f
   D:C          Data constructor for dictionary for class C
   NTCo:T       Coercion connecting newtype T with its representation type
   TFCo:R       Coercion connecting a data family to its representation type R

so the above function was a "worker" for the quot2 function. Often, a function that operates on boxed integers will have a "worker" counterpart that operates on unboxed integers. In our Quot.hs example, this unboxed worker function was all that was needed, so the original quot2 function was dropped in the final compiler output.

Upvotes: 3

Related Questions