GADTs for a DSL: swings and roundabouts?

Question

The typical examples for the benefits of a GADT are representing the syntax for a DSL; say here on the wiki or the PLDI 2005 paper.

I can see that if you have a AST that's type-correct by construction, writing an eval function is easy.

How to build the GADT handling into a REPL? Or more specifically into a Read-Parse-Typecheck-Eval-Print-Loop? I'm seeing that you just push the complexity from the eval step into earlier steps.

Does GHCi use a GADT internally to represent expressions it's evaluating? (The expressions are a lot chunkier than a typical DSL.)

For one thing, you can't derive Show for a GADT, so for the Print step you either hand-roll Show instances or something like this:

{-# LANGUAGE  GADTs, StandaloneDeriving  #-}

data Term a where
  Lit :: Int -> Term Int
  Inc :: Term Int -> Term Int
  IsZ :: Term Int -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a
  Pair :: (Show a, Show b) => Term a -> Term b -> Term (a,b)
  Fst :: (Show b) => Term (a,b) -> Term a
  Snd :: (Show a) => Term (a,b) -> Term b

deriving instance (Show a) => Show (Term a)

(It seems to me those Show constraints tangled in the constructors are already failing to separate concerns.)

I'm more thinking about the user-experience for somebody entering DSL expressions, rather than the programmer's convenience of the eval function. Either:

The user enters expressions directly using the GADT constructors. It's easy to make a syntactically correct but ill-typed mistake (say a mis-placed parens). Then GHCi gives rather unfriendly rejection messages. Or
The REPL takes the input as text and parses it. But for a GADT like that, getting a Read instance is real hard work. So perhaps
The application has two data structures: one being type-mistake-tolerant; the other being the GADT; and the validate step constructs the GADT AST, if it can do so type-safely.

At that last bullet, I seem to be back with 'smart constructors', that GADTs are supposed to improve on(?) What's more I've doubled the work somewhere.

I don't have a 'better way' to approach it. I'm wondering how to approach DSL applications in practice. (For context: I'm thinking about a database query environment, where type inference has to look at the types of the fields in the database to validate what operations on them.)

Addit: after working through the answer from @Alec

I see the code for pretty printing in glambda involves several layers of classes and instances. Something feels wrong here as opposed to what are the claimed advantages of a GADT for an AST. The idea of a (well-typed) AST is you can equally easily: eval it; or pretty-print it; or optimise it; or code-generate from it; etc.

glambda's seems to be aimed at eval'ing (which is fair enough given the purpose of the exercise). I'm wondering ...

Why the need to express the whole syntax for the (E)DSL in one datatype? (The wikibook example starts its straw man doing that data Expr = ...; and rapidly runs into type trouble. Well of course it does; that's never going to work; almost anything would work better than that; I feel cheated.)
If we end up writing classes and instances anyway, why not make each syntax production a separate datatype: data Lit = Lit Int ... data If b a1 a2 = If b a1 a2 ... Then a class IsTerm a c | a -> c where ... (i.e. a FunDep or maybe a Type Family whose instances tell us the Term's result-type.)
Now the EDSL uses the same constructors (the user doesn't care they're from different datatypes); and they apply 'sloppy' type-checking. Pretty printing/error reporting also doesn't need tight typechecking. Eval does, and insists on the IsTerm instances all lining up.

I didn't suggest this approach before, because it seemed to involve too much crufty code. But actually it's no worse than glambda -- that is, when you consider the whole functionality, not just the eval step.

It seems to me a big advantage to express the syntax only once. Furthermore it seems more extensible: add a new datatype per syntax production, rather than breaking open an existing datatype. Oh, and because they're H98 datatypes (no existentials), deriving works fine.

Alec · Accepted Answer

Note that GHCi does not use GADTs to represent expressions. Even GHC's internal core expression type Expr is not a GADT.

DSLs

For the purpose of having a larger more fleshed out example of your Term type, consider glambda. Its Exp type even tracks variables at the type level.

There is a second UExp data type which, as you observed yourself, is what gets actually parsed from the REPL. This type then gets typechecked into Exp and passed on to a continuation with:
```
check :: (MonadError Doc m, MonadReader Globals m)
      => UExp -> (forall t. STy t -> Exp '[] t -> m r)
      -> m r
```
Pretty-printing of UExp and Exp is hand-written, but at least uses the same code (it does this via a PrettyExp class).
The evaluation code itself is beautiful, but I doubt I need to sell you on that. :)

EDSLs

As I understand it, GADTs are splendid for EDSLs (embedded DSLs), since these are just portions of code in a large Haskell program. Yes, type errors can be complicated (and will come from GHC directly), but that's the price you pay for being able to maintain type-level invariants in your code. Consider, for instance, hoopl's representation of basic blocks in a CFG:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

Sure, you open yourself up to nasty type errors, but you also have the ability to track fallthrough information at the type-level. This makes it much easier to think about dataflow problems.

So what...?

The point I'm trying to make is: if your GADT is being constructed from a String (or a custom REPL), you'll have a rough time performing the translation. That's unavoidable because what you are doing is essentially re-implementing a simple type-checker. Your best bet is to confront this head on (as glambda does) and distinguish the parsing from the type-checking.

However, if you can afford to stay within the bounds of Haskell code, you can just hand parsing and typechecking to GHC. IMHO, EDSLs are way cooler and more practical that non-embedded DSLs.

GADTs for a DSL: swings and roundabouts?

Answers (1)

DSLs

EDSLs

So what...?

Related Questions