AntC
AntC

Reputation: 2806

GADTs for a DSL: swings and roundabouts?

The typical examples for the benefits of a GADT are representing the syntax for a DSL; say here on the wiki or the PLDI 2005 paper.

I can see that if you have a AST that's type-correct by construction, writing an eval function is easy.

How to build the GADT handling into a REPL? Or more specifically into a Read-Parse-Typecheck-Eval-Print-Loop? I'm seeing that you just push the complexity from the eval step into earlier steps.

Does GHCi use a GADT internally to represent expressions it's evaluating? (The expressions are a lot chunkier than a typical DSL.)

I'm more thinking about the user-experience for somebody entering DSL expressions, rather than the programmer's convenience of the eval function. Either:

At that last bullet, I seem to be back with 'smart constructors', that GADTs are supposed to improve on(?) What's more I've doubled the work somewhere.

I don't have a 'better way' to approach it. I'm wondering how to approach DSL applications in practice. (For context: I'm thinking about a database query environment, where type inference has to look at the types of the fields in the database to validate what operations on them.)

Addit: after working through the answer from @Alec

I see the code for pretty printing in glambda involves several layers of classes and instances. Something feels wrong here as opposed to what are the claimed advantages of a GADT for an AST. The idea of a (well-typed) AST is you can equally easily: eval it; or pretty-print it; or optimise it; or code-generate from it; etc.

glambda's seems to be aimed at eval'ing (which is fair enough given the purpose of the exercise). I'm wondering ...

I didn't suggest this approach before, because it seemed to involve too much crufty code. But actually it's no worse than glambda -- that is, when you consider the whole functionality, not just the eval step.

It seems to me a big advantage to express the syntax only once. Furthermore it seems more extensible: add a new datatype per syntax production, rather than breaking open an existing datatype. Oh, and because they're H98 datatypes (no existentials), deriving works fine.

Upvotes: 5

Views: 587

Answers (1)

Alec
Alec

Reputation: 32309

Note that GHCi does not use GADTs to represent expressions. Even GHC's internal core expression type Expr is not a GADT.

DSLs

For the purpose of having a larger more fleshed out example of your Term type, consider glambda. Its Exp type even tracks variables at the type level.

  • There is a second UExp data type which, as you observed yourself, is what gets actually parsed from the REPL. This type then gets typechecked into Exp and passed on to a continuation with:

    check :: (MonadError Doc m, MonadReader Globals m)
          => UExp -> (forall t. STy t -> Exp '[] t -> m r)
          -> m r
    
  • Pretty-printing of UExp and Exp is hand-written, but at least uses the same code (it does this via a PrettyExp class).

  • The evaluation code itself is beautiful, but I doubt I need to sell you on that. :)

EDSLs

As I understand it, GADTs are splendid for EDSLs (embedded DSLs), since these are just portions of code in a large Haskell program. Yes, type errors can be complicated (and will come from GHC directly), but that's the price you pay for being able to maintain type-level invariants in your code. Consider, for instance, hoopl's representation of basic blocks in a CFG:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

Sure, you open yourself up to nasty type errors, but you also have the ability to track fallthrough information at the type-level. This makes it much easier to think about dataflow problems.

So what...?

The point I'm trying to make is: if your GADT is being constructed from a String (or a custom REPL), you'll have a rough time performing the translation. That's unavoidable because what you are doing is essentially re-implementing a simple type-checker. Your best bet is to confront this head on (as glambda does) and distinguish the parsing from the type-checking.

However, if you can afford to stay within the bounds of Haskell code, you can just hand parsing and typechecking to GHC. IMHO, EDSLs are way cooler and more practical that non-embedded DSLs.

Upvotes: 5

Related Questions