Reputation: 2185
(Misleading title: it's only one of a plethora of inter-related similar questions below: these sound like asking for a full reference manual but keep in mind for this topic there is no reference manual other than the entirety of GHC's source-codes of its STG pipeline stage, and the collective accumulated experience of others/"insiders"..)
I'm exploring "transpiling" Haskell (from scratch for fun/learning, ignoring existing projects; target language/s similarly high-level / "already-fit-for-STG-machine" with existing GC + lambdas/func-values + closures) and so I'm trying to become ever more familiar with GHC's STG IR. Having repeatedly gone through the dozen-or-two online articles/videos of varying age, depth, detail that actually deal with the topic (plus the original paper, plus StgSyn.hs), and understanding many-perhaps-most basic principles, seeing -ddump-stg
ed output still baffles me in various parts (I won't manually parse it but reuse GHC API's in-memory AST later on of course) --- mostly I think I'm stuck mapping my "roughly known" concepts to the "still-foreign" abbreviated/codified identifiers of that IR. If you know your way around STG a bit, mind looking at the following mini-sample to clarify a few open questions and help further solidify my (and future searchers') grasp?
From a most simple .hs module, I have -ddump-stg
ed twice, first (on the left) with -O0
and then (on the right) with -O2
, both captured in this diff.
Walking through everything def-by-def..
Lines L_|R5-11: so in O2, testX1
and testX2
seem to be global constants/literals for the integers 4 and 5 --- O0 doesn't have them. Curious!
Is Str=DmdType
something about strictness? "Strictness is of type on-demand" or some such? But then a top-level/heap-ish/"global" constant literal can't be "lazy" can it.. (one of the things where I can't just casually Ctrl+F in StgSyn.hs --- it's not in there! which is odd in its own way, how come there's STG syntax not in StgSyn.hs)
Caf
have a rough idea about constant-applicative-forms, but Unf=OtherCon
? "Other constructor" (unboxed/native Type.S#
-related?) ..
Line L6|R14: Surprised to still see type-class information in there (Num
), is that "just info/annotation" or is this crucial for any of the built-in code-gens to set up some "dictionary" lookup machinery at runtime? (I'd sure hope by the late STG / pre-CMM stage that would be resolved and inlined already where possible at least in O2. After all GHC has also decided to type-default 4
and 5
to Integer
). Generally speaking I understand STG is "untyped" other than denoting prim types, saturated cons, perhaps strings (looks like it later on at the bottom), so such "typeclass" annotations can only be.. I guess for readers to find their way around the ddump-ed *.stg. But correct me if not.
GblId
probably just "global identifier" aka top-level CAF right? Arity clear.
Line L7|R18: now Str=DmdType
for testX
is, only in O2, followed by a freakish <S(LLC(C(S))LLLL),U(1*C1(C1(U)),A,1*C1(C1(U)),A,A,A,C(U))><L,U>
! What's that, SKI calculus? ;D no seriously, LLC.. LLLL.. stack or other memory layout hints for CMM? Any idea? Must be some optimization, would like to understand which-and-how..
Line L8|R20: $dNum_sGM
(left) and $dNum_sIx
(right) have me a bit concerned, they don't seem to be "defined at the module level" here anywhere. Typeclass "method dispatch dictionary lookup" kind of thing? Would eg. CMM take this together with the above Num
annotation to set things up? It always appears together with the input
func arg.
The whole function "body" for both left and right can be seen here essentially as "3 let
s with a lambda-ish form for 3 atoms, 2 of which are statically known literal-constants" --- I suppose this is standard and to be expected in the STG IR AST? For the first of these, funnily enough we could say that O0 has "inlined the global (what is testX1 or testX2 in O2) and O2 hasn't" (making the latter much shorter as that applies to both these constant literals).
I've only ever seen Occ=Once
, what are the others and how to interpret? Once
for one isn't even in StgSyn.hs..
Now LclId
a counterpart to the earlier encountered GblId
. That's denoting the scope of the identifier? Could it also be anything else, in this expression context? As in: if traversing the AST I roughly know how deep I am, I can ignore this since if I'm at the top-level it must be GblId
and otherwise LclId
? Hm.. maybe better take what STG gives me but then I need to be sure about the semantics and possibilities.. guys, using StgSyn.hs I have the wrong source file, right? Nothing on this in there either.. (always hopeful as its comments are quite well-done)
the rest is just metadata as string constants, OK.. oh wait, look at O2, there's Str=DmdType m1
and Str=DmdType m
, what's the m
/m1
about, another thing I don't see "defined anywhere at the module level" here? And it's not in O0..
still going strong? Merely a bonus question (for now), tell us about srt:SRT:[]
;)
Upvotes: 3
Views: 111
Reputation: 4231
Just a few tidbits - a full answer is quite beyond my knowledge.
The type of your function is
testX :: GHC.Num.Num a => a -> a
It’s compiled to a function with two arguments: a dictionary of the Num
type class, and the actual argument.
The $d…
names stand for dictionaries of type class instances. The <S(LLC(C(S))LLLL),…
annotations are strictness information about the function arguments. They basically say which part of the argument will be used by your function and which not. Looks a bit weird here because it contains information about all the class instance members.
Some of this is explained here:
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand
The str:STR:
is the „Static reference table“, i.e. list of free variables of the expression - in your case, always []
.
Upvotes: 1