sinoTrinity
sinoTrinity

Reputation: 1195

How to implement source map in a compiler?

I'm implementing a compiler compiling a source language to a target language (assembly like) in Haskell.

For debugging purpose, a source map is needed to map target language assembly instruction to its corresponding source position (line and column).

I've searched extensively compiler implementation, but none includes a source map.

Can anyone please point me in the right direction on how to generate a source map?

Code samples, books, etc. Haskell is preferred, other languages are also welcome.

Upvotes: 3

Views: 500

Answers (1)

SK-logic
SK-logic

Reputation: 9715

Details depend on a compilation technique you're applying.

If you're doing it via a sequence of transforms of intermediate languages, as most sane compilers do these days, your options are following:

  • Annotate all intermediate representation (IR) nodes with source location information. Introduce special nodes for preserving variable names (they'll all go after you do, say, an SSA-transform, so you need to track their origins separately)
  • Inject tons of intrinsic function calls (see how it's done in LLVM IR) instead of annotating each node
  • Do a mixture of the above

The first option can even be done nearly automatically - if each transform preserves source location of an original node in all nodes it creates from it, you'd only have to manually change some non-trivial annotations.

Also you must keep in mind that some optimisations may render your source location information absolutely meaningless. E.g., a value numbering would collapse a number of similar expressions into one, probably preserving a source location information for one random origin. Same for rematerialisation.

With Haskell, the first approach will result in a lot of boilerplate in your ADT definitions and pattern matching, even if you sugar coat it with something like Scrap Your Boilerplate (SYB), so I'd recommend the second approach, which is extensively documented and nicely demonstrated in LLVM IR.

Upvotes: 5

Related Questions