Reputation: 31
When using a language, I find it useful to understand exactly how my code is compiled. For example, I like using C because I can inspect the disassembled code in a debugger like GDB and quickly see how the C code I write actually behaves.
The Haskell Wiki has some debugging advice, but I haven't found a way to see the disassembled code generated by GHC. Looking at it with GDB isn't very helpful, since I can't figure out how to find the code for a particular function, figure out where a particular variable is stored, set breakpoints or anything else that I can do with a C program.
My question: How can I disassemble code generated by GHC in a way that's useful? Is there a tool to produce an annotated listing of the code generated by GHC? Is there a way to compile functions in a way that you can see how the compiled code "ticks"?
Some additional background and references:
One reason I want to do this is I think this kind of understanding would be extremely important for getting good performance (e.g., C vs Haskell Collatz conjecture speed comparison). The Haskell wiki entry on performance doesn't go into enough detail to cover the answer to that SO question (namely, optimization of n % 2
and n / 2
by C compilers versus lack of optimization of quot n 2
and rem n 2
in GHC). If I could see the disassembled code, I could figure it out for myself.
Some resources I've found included:
These resources are really useful, but to fully understand them, I feel like I need to see actual compiled GHC code in action.
Upvotes: 2
Views: 820
Reputation: 50864
The GHC compiler provides some support for debugging programs under GDB. If you have a program:
-- Quot.hs
{-# NOINLINE quot2 #-}
quot2 :: Int -> Int
quot2 n = n `quot` 2
main = print $ quot2 10
you can compile it with the -g
flag and run it under GDB using the instructions in the above link:
$ ghc -O2 -g -rtsopts Quot.hs # Note: I'm using GHC 8.6.5
$ gdb --args ./Quot +RTS -V0
You can set a breakpoint at the line quot2 n = n `quot` 2
, disassemble, and step through the relevant code:
(gdb) break Quot.hs:5
Breakpoint 1 at 0x4062c0: file Quot.hs, line 5.
(gdb) run
Starting program: /u/buhr/src/haskell/Quot +RTS -V0
...
Breakpoint 1, Main_zdwquot2_info () at Quot.hs:5
(gdb) display/10i $rip
1: x/10i $rip
=> 0x4062c0 <Main_zdwquot2_info>: mov %r14,%rax
0x4062c3 <Main_zdwquot2_info+3>: shr $0x3f,%rax
0x4062c7 <Main_zdwquot2_info+7>: mov %r14,%rbx
0x4062ca <Main_zdwquot2_info+10>: add %rax,%rbx
0x4062cd <Main_zdwquot2_info+13>: sar %rbx
0x4062d0 <Main_zdwquot2_info+16>: jmpq *0x0(%rbp)
...
(gdb) p $r14
$1 = 10
(gdb)
(Here the Main_zdwquot2_info
symbol is a mangled name. See below for some info on this.) Anyway, from this disassembly you can can see that the optimized quot2
code uses a couple of right shifts to perform the equivalent of:
((n < 0 : 1 ? 0) + n) >> 1
to get truncation towards zero right for negative numbers.
If div
had been used instead, the code would have looked like:
1: x/10i $rip
=> 0x4062c0 <Main_zdwdiv2_info>: mov %r14,%rbx
0x4062c3 <Main_zdwdiv2_info+3>: sar %rbx
0x4062c6 <Main_zdwdiv2_info+6>: jmpq *0x0(%rbp)
...
So, even though I would have said you can't learn anything from disassembling GHC code, I guess I was wrong. We've already learned:
`quot` 2
and `div` 2
are optimized to shifts.quot
in preference to div
for maximum efficiency doesn't apply if you're dividing by 2, since the code generated by `div` 2
is more efficient.However, it's still really hard to trace Haskell code execution using GDB. If you had set a breakpoint at the main entry point (which is typically called Main_main_info
):
(gdb) break Main_main_info
Breakpoint 2 at 0x4063c8: file Div.hs, line 7.
(gdb) run
...
Breakpoint 2, Main_main_info () at Div.hs:7
1: x/10i $rip
=> 0x4063c8 <Main_main_info>: mov $0x4ac622,%edi
0x4063cd <Main_main_info+5>: mov $0x4a4348,%esi
0x4063d2 <Main_main_info+10>: mov $0x4a73b8,%r14d
0x4063d8 <Main_main_info+16>: jmpq 0x433de8 <base_GHCziIOziHandleziText_hPutStrzq_info>
...
it would be easy to get lost. Here, $0x4ac622 is the closure for the Haskell constant True
, $0x4a4348 is the closure for the printable string representation of the result of quot2 10
, and $0x4a73b8 is the closure for stdout
, all being set up for a GHC.IO.Handle.Text.hPutStr'
call.
In GHC code, functions are invoked with jmpq
, not callq
, so you can't really (gdb) next
over calls you aren't interested in tracing. Moreover, because of the lazy evaluation model, arguments aren't evaluated before functions are called, so to trace the execution of the quot2 10
call from scratch, you'd need to trace the execution of hPutStr'
to find the point where it forces the evaluation of the closure for the printable representation of the result, and then trace the evaluation of that closure to see where it forces the evaluation of quot2 10
itself.
The other issue is that the assembly is really low level compared to the original Haskell code. Though you might be able to learn something about "micro-optimizations", like how `div` 2
is implemented, most Haskell performance problems are caused by higher-level inefficiencies (e.g., unnecessary boxing of values, failure of list fusion, etc.). It would be incredibly difficult to diagnose these problems from a GDB session.
For that reason, most people seeking to understand GHC-compiled code look at various forms of intermediate compiler output. The GHC compiler, as you've probably discovered from your research, performs a number of transformations through intermediate forms. These can be dumped using a series of -ddump-xxx
flags. Often, there's a lot of unnecessary detail, so adding the flag -dsuppress-all
and sometimes also the flag -dsuppress-uniques
can help to keep the noise down. Also, -fforce-recomp
is helpful when running ghc
multiple times to look at different forms, since it forces a recompilation (and dumping of desired output), even if the source hasn't changed. So, you might dump the output of the desugaring pass using:
ghc -O2 -fforce-recomp -ddump-ds -dsuppress-all -dsuppress-uniques Quot.hs
Anyway, here are the intermediate forms of most interest:
-ddump-ds
), a simplified version of the Haskell language that is still recognizably Haskell. No optimizations have been performed at this point, so this dump is most useful for just getting to know the Core syntax and how it differs from "real" Haskell. Some of the best documentation of Core
is in the GHC source tree -- see ghc/compiler/coreSyn/CoreSyn.hs
and the definition of the Expr
type.-ddump-simpl
).-ddump-stg
). Unlike Core, STG is recognizably "not Haskell". It's an untyped functional language that best represents the way a GHC program "actually ticks". With the STG in front of you as a roadmap/reference, you could probably work through lower-level representations, include stepping through the disassembled code with a debugger, and understand what it's actually doing. The best reference I've found for understanding STG is the paper How to make a fast curry. In the paper, STG is basically the language described in Figure 1, and the assembly code generated corresponds to the "evaluation rules" in Figure 2. You might also find I know kung fu: learning STG by example helpful, though I think the examples there are outdated.-ddump-cmm
) and the assembly (-ddump-asm
). CMM is a C-like assembly language. I haven't found any particuarly good/accurate documentation on it, though it's often easier to understand than the assembly, which in turn is usually easier to follow than the disassembly under GDB, because it includes additional symbols that are missing from the debugging info. I was able to identify the closures for True
, etc. in the example above by comparing the assembly dumped by GHC with what I was seeing under GDB.I find I usually have to look at the STG, CMM, and assembly all side-by-side to figure out how some GHC code is working.
So, maybe that'll get you started.
As noted above, the names in the executable that you'll see in GDB have been mangled from the forms in the dumped compiler output (including -ddump-asm
). I don't know if this is documented anywhere, but here's a table I got from the GHC source code (ghc/compiler/utils/Encoding.hs
):
'(' ="ZL" 'Z' ="ZZ" '$' ="zd" '<' ="zl" '\\'="zr"
')' ="ZR" 'z' ="zz" '=' ="ze" '-' ="zm" '/' ="zs"
'[' ="ZM" '&' ="za" '>' ="zg" '!' ="zn" '*' ="zt"
']' ="ZN" '|' ="zb" '#' ="zh" '+' ="zp" '_' ="zu"
':' ="ZC" '^' ="zc" '.' ="zi" '\''="zq" '%' ="zv"
so the name Main_zdwquot2_info
in GDB correspond to Main_$wquot2_info
in dumped GHC output. That still looks mangled, but that's because it's a "system name". As explained in comments in ghc/compiler/basicTypes/OccName
:
Making System Names
-------------------
Here's our convention for splitting up the interface file name space:
d... dictionary identifiers
(local variables, so no name-clash worries)
All of these other OccNames contain a mixture of alphabetic
and symbolic characters, and hence cannot possibly clash with
a user-written type or function name
$f... Dict-fun identifiers (from inst decls)
$dmop Default method for 'op'
$pnC n'th superclass selector for class C
$wf Worker for function 'f'
$sf.. Specialised version of f
D:C Data constructor for dictionary for class C
NTCo:T Coercion connecting newtype T with its representation type
TFCo:R Coercion connecting a data family to its representation type R
so the above function was a "worker" for the quot2
function. Often, a function that operates on boxed integers will have a "worker" counterpart that operates on unboxed integers. In our Quot.hs
example, this unboxed worker function was all that was needed, so the original quot2
function was dropped in the final compiler output.
Upvotes: 3