Blue Nebula
Blue Nebula

Reputation: 1174

How can I remove all the boilerplate introduced by Trees That Grow?

I'm trying to define a programming language in Haskell. I wish to make the AST extensible: users of the AST module (for instance a pretty printer, an interpreter, a compiler, a type system, a language server and so on) should be able to extend it, by adding both new functionalities and new data (new datatypes to extend the syntax as well as new fields to the current data constructors to store data needed by the various components).

I tried to achieve this goal by using Trees That Grow (TTG). It works, but it results in way too much boilerplate. My minimal prototype becomes 10 times larger in terms of lines of code, and this number grows by the AST size times the number of extensions. Changing something minor requires changing several lines of the AST module, while changing something in the way extensibility is implemented would require rewriting most of it.

Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?

Example with code of what I have so far

The "base" AST

This is just one small piece of the AST. It's something very similar to JSON, as I decided to start with a small prototype.

module AST ( KeyValue(..), Data(..) ) where

data KeyValue = KV String Data deriving (Show, Eq, Ord)

data Data =
    Null |
    Int Int |
    Num Double |
    Bool Bool |
    String String |
    Array [Data] |
    Object [KeyValue] deriving (Show, Eq, Ord)

The extensible AST, via Trees That Grow

In order to extend it via TTG, the datatypes become something like this:

data KeyValueX x =
    KVX (XKV x) String (DataX x) |
    KeyValueX (XKeyValue x)
 
data DataX x =
    NullX (XNull x) |
    IntX (XInt x) Int |
    NumX (XNum x) Double |
    BoolX (XBool x) Bool |
    StringX (XString x) String |
    ArrayX (XArray x) [DataX x] |
    ObjectX (XObject x) [KeyValueX x] |
    DataX (XData x)

Each of those types with a name starting in X is a type family:

type family XKV x
type family XKeyValue x
type family XNull x
type family XInt x
type family XNum x
type family XBool x
type family XString x
type family XArray x
type family XObject x
type family XData x

Further each of them requires to be listed in a type that makes it easier to derive classes:

type ForallX (c :: Type -> Constraint) x = (
    c (XKV x), c (XKeyValue x),
    c (XNull x), c (XInt x), c (XNum x), c (XBool x),
    c (XString x), c (XArray x), c (XObject x), c (XData x)
    )

-- now we can do:
deriving instance ForallX Show x => Show (KeyValueX x)
deriving instance ForallX Show x => Show (DataX x)
deriving instance ForallX Eq x => Eq (KeyValueX x)
deriving instance ForallX Eq x => Eq (DataX x)
deriving instance ForallX Ord x => Ord (KeyValueX x)
deriving instance ForallX Ord x => Ord (DataX x)

And of course everything requires to be exported:

module AST ( KeyValueX(..), DataX(..),
             XKV, XKeyValue,
             XNull, XNum, XBool, XString, XArray, XObject, XData,
             ForallX
           ) where

An extension to the AST

This is what is needed in order to create an extension. Even just the "identity" extension (UnDecorated) which needs to be provided.

For every instance you need to implement a typeclass for the type family of every type and data constructor:

data UD  -- UnDecorated, identity extension
 
type instance XKV UD = ()
type instance XKeyValue UD = Void
 
type instance XData UD = Void
type instance XNull UD = ()
type instance XInt UD = ()
type instance XNum UD = ()
type instance XBool UD = ()
type instance XString UD = ()
type instance XArray UD = ()
type instance XObject UD = ()

Then, in order to do things properly and ergonomic enough for the user, you need patterns and type aliases for every data constructor and data type:

type KeyValue = KeyValueX UD
pattern KV :: String -> Data -> KeyValue
pattern KV x y <- KVX _ x y where KV x y = KVX () x y
 
type Data = DataX UD
pattern Null :: Data
pattern Null <- NullX _ where Null = NullX ()
pattern DInt :: Int -> Data
pattern DInt x <- IntX _ x where DInt x = IntX () x
pattern DNum :: Double -> Data
pattern DNum x <- NumX _ x where DNum x = NumX () x
pattern DBool :: Bool -> Data
pattern DBool x <- BoolX _ x where DBool x = BoolX () x
pattern DString :: String -> Data
pattern DString x <- StringX _ x where DString x = StringX () x
pattern Array :: [Data] -> Data
pattern Array x <- ArrayX _ x where Array x = ArrayX () x
pattern Object :: [KeyValue] -> Data
pattern Object x <- ObjectX _ x where Object x = ObjectX () x

And of course all this stuff should be exported too:

module AST ( ...,
              UD,
              KeyValue, Data,
              pattern KV,
              pattern Null, pattern Num, pattern Bool,
              pattern String, pattern Array, pattern Object
           ) where

Summary

TTG turned my simple 10-line module, into a module of more than 100 lines where 90% of the code is boring, hard-to-maintain boilerplate:

I would estimate that the whole language could take a couple dozen types with a total of more than a hundred of data constructors. Then I would need to define a handful of extensions to the AST. A non-extensible AST would take around 100 lines (as an order of magnitude), while one extended via TTG would take around 10,000. All the required boilerplate would make all of this unmanageable for me.

Question

Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?

Otherwise are there any alternative ways to make my AST extensible without requiring this much work?

Upvotes: 12

Views: 422

Answers (2)

songzh
songzh

Reputation: 35

There is a project called derive-ttg

https://github.com/HaskellZhangSong/derive-ttg

For a type:

data Lam a = Var String a
           | App { t1 :: (Lam a) , t2::(Lam a), ext :: a}  
           | Abs String (Lam a) a
         deriving Eq

there is a function derive_ttg to generated LamX which can be decorated by the type family

   derive_ttg ''Lam []
  ======>
    data LamX (a_aRb :: ghc-prim:GHC.Types.Type) eps_a2vI
      = VarX (XVar eps_a2vI) String a_aRb |
        AppX {appX :: (XApp eps_a2vI),
              t1X :: (LamX eps_a2vI a_aRb),
              t2X :: (LamX eps_a2vI a_aRb),
              extX :: a_aRb} |
        AbsX (XAbs eps_a2vI) String (LamX eps_a2vI a_aRb) a_aRb |
        LamX (XLam eps_a2vI)
    type family XVar eps
    type family XApp eps
    type family XAbs eps
    type family XLam eps
    type ForallXLam (phi :: ghc-prim:GHC.Types.Type
                            -> Constraint) esp =
        (phi (XVar esp), phi (XApp esp), phi (XAbs esp), phi (XLam esp))

derive_simple_decorator can be used to generate what's needed to decorate a data constructor.

   derive_simple_decorator ''Lam "Parse" [('App, ''Bool)] ''()
  ======>
    type LamParse a_a2z3 = LamX a_a2z3
    data Parse
    type instance XApp Parse = Bool
    type instance XVar Parse = ()
    type instance XAbs Parse = ()
    type instance XLam Parse = ()

    derive_simple_decorator ''Lam "TC" [(''Lam, ''Bool)] ''()
  ======>
    type LamTC a_a2Gx = LamX a_a2Gx
    data TC
    type instance XLam TC = Bool
    type instance XVar TC = ()
    type instance XApp TC = ()
    type instance XAbs TC = ()

For your example:

import Data.Void
data KeyValue = KV String Data deriving (Show, Eq, Ord)

data Data =
    Null |
    Int Int |
    Num Double |
    Bool Bool |
    String String |
    Array [Data] |
    Object [KeyValue] deriving (Show, Eq, Ord)

derive_ttg ''KeyValue []
derive_ttg ''Data [''KeyValue]
derive_simple_decorator ''Data "UD" [(''Data, ''Void)] ''()

Will get

    derive_ttg ''KeyValue []
  ======>
    data KeyValueX eps_a54X
      = KVX (XKV eps_a54X) String Data | KeyValueX (XKeyValue eps_a54X)
    type family XKV eps
    type family XKeyValue eps
    type ForallXKeyValue (phi :: ghc-prim:GHC.Types.Type
                                 -> Constraint) esp =
        (phi (XKV esp), phi (XKeyValue esp))
/Users/songzh/project/haskell/derive-ttg/test/Spec.hs:35:1-30: Splicing declarations
    derive_ttg ''Data [''KeyValue]
  ======>
    data DataX eps_a56h
      = NullX (XNull eps_a56h) |
        IntX (XInt eps_a56h) Int |
        NumX (XNum eps_a56h) Double |
        BoolX (XBool eps_a56h) Bool |
        StringX (XString eps_a56h) String |
        ArrayX (XArray eps_a56h) [DataX eps_a56h] |
        ObjectX (XObject eps_a56h) [KeyValueX eps_a56h] |
        DataX (XData eps_a56h)
    type family XNull eps
    type family XInt eps
    type family XNum eps
    type family XBool eps
    type family XString eps
    type family XArray eps
    type family XObject eps
    type family XData eps
    type ForallXData (phi :: ghc-prim:GHC.Types.Type
                             -> Constraint) esp =
        (phi (XNull esp), phi (XInt esp), phi (XNum esp), phi (XBool esp),
         phi (XString esp), phi (XArray esp), phi (XObject esp),
         phi (XData esp))

   derive_simple_decorator ''Data "UD" [(''Data, ''Void)] ''()
  ======>
    type DataUD = DataX
    data UD
    type instance XData UD = Void
    type instance XNull UD = ()
    type instance XInt UD = ()
    type instance XNum UD = ()
    type instance XBool UD = ()
    type instance XString UD = ()
    type instance XArray UD = ()
    type instance XObject UD = ()

The pattern synonym generation has not been implemented yet. I also encounter some problems with deriving class instances. However, I believe it can be solved by derive-topdown package. A lot of features and details are not settled yet in this package currently. However, I think the boilerplate of TTG can be reduced or removed by a template Haskell library in the end.

Upvotes: 1

Li-yao Xia
Li-yao Xia

Reputation: 33519

You can merge all of the type families into one indexed by a symbol:

data KeyValueX x =
    KVX (X "KVX" x) String (DataX x) |
    KeyValueX (X "KeyValueX" x)
  deriving Generic
 
data DataX x =
    NullX (X "NullX" x) |
    IntX (X "IntX" x) Int |
    NumX (X "NumX" x) Double |
    BoolX (X "BoolX" x) Bool |
    StringX (X "StringX" x) String |
    ArrayX (X "ArrayX" x) [DataX x] |
    ObjectX (X "ObjectX" x) [KeyValueX x] |
    DataX (X "DataX" x)
  deriving Generic

--

type family X (s :: k) (x :: l) :: Type

Use generics to grab all of the constructor names:

type ForAllX c x = (AllX c (CNames (DataX x)) x, AllX c (CNames (KeyValueX x)) x)

deriving instance ForAllX Eq x => Eq (DataX x)
deriving instance ForAllX Eq x => Eq (KeyValueX x)

-- CNames defined using generics, below

All of the boilerplate up to that point could also be generated from the "base AST" using Template Haskell.

Having only one type family makes it easy to define extensions with catch-all clauses:

data UD

type instance X s UD = XUD s

type family XUD (s :: Symbol) :: Type where
  XUD "KeyValueX" = Void
  XUD "DataX" = Void
  XUD _ = ()

As for the patterns, maybe just exposing the constructors is not so bad? GHC does that.

Imports and generics code to make this answer self-contained:

{-# LANGUAGE
  DataKinds,
  DeriveGeneric,
  PolyKinds,
  StandaloneDeriving,
  TypeFamilies,
  UndecidableInstances #-}
module T where

import Data.Kind (Constraint, Type)
import Data.Void
import GHC.Generics
import GHC.TypeLits

type CNames a = GCNames (Rep a)

type family GCNames (f :: Type -> Type) :: [Symbol] where
  GCNames (M1 D c f) = GCNames f
  GCNames (f :+: g) = GCNames f ++ GCNames g
  GCNames (M1 C (MetaCons name _ _) f) = '[name]

type family (xs :: [k]) ++ (ys :: [k]) :: [k] where
  '[] ++ ys = ys
  (x ': xs) ++ ys = x ': (xs ++ ys)

type family AllX (c :: Type -> Constraint) (xs :: [Symbol]) (x :: l) :: Constraint where
  AllX c '[] x = ()
  AllX c (s ': ss) x = (c (X s x), AllX c ss x)

Gist: https://gist.github.com/Lysxia/3f6781b3a307a7e0c564920d6277bee2

Upvotes: 7

Related Questions