Reputation: 1174
I'm trying to define a programming language in Haskell. I wish to make the AST extensible: users of the AST module (for instance a pretty printer, an interpreter, a compiler, a type system, a language server and so on) should be able to extend it, by adding both new functionalities and new data (new datatypes to extend the syntax as well as new fields to the current data constructors to store data needed by the various components).
I tried to achieve this goal by using Trees That Grow (TTG). It works, but it results in way too much boilerplate. My minimal prototype becomes 10 times larger in terms of lines of code, and this number grows by the AST size times the number of extensions. Changing something minor requires changing several lines of the AST module, while changing something in the way extensibility is implemented would require rewriting most of it.
Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?
This is just one small piece of the AST. It's something very similar to JSON, as I decided to start with a small prototype.
module AST ( KeyValue(..), Data(..) ) where
data KeyValue = KV String Data deriving (Show, Eq, Ord)
data Data =
Null |
Int Int |
Num Double |
Bool Bool |
String String |
Array [Data] |
Object [KeyValue] deriving (Show, Eq, Ord)
In order to extend it via TTG, the datatypes become something like this:
data KeyValueX x =
KVX (XKV x) String (DataX x) |
KeyValueX (XKeyValue x)
data DataX x =
NullX (XNull x) |
IntX (XInt x) Int |
NumX (XNum x) Double |
BoolX (XBool x) Bool |
StringX (XString x) String |
ArrayX (XArray x) [DataX x] |
ObjectX (XObject x) [KeyValueX x] |
DataX (XData x)
Each of those types with a name starting in X
is a type family
:
type family XKV x
type family XKeyValue x
type family XNull x
type family XInt x
type family XNum x
type family XBool x
type family XString x
type family XArray x
type family XObject x
type family XData x
Further each of them requires to be listed in a type that makes it easier to derive classes:
type ForallX (c :: Type -> Constraint) x = (
c (XKV x), c (XKeyValue x),
c (XNull x), c (XInt x), c (XNum x), c (XBool x),
c (XString x), c (XArray x), c (XObject x), c (XData x)
)
-- now we can do:
deriving instance ForallX Show x => Show (KeyValueX x)
deriving instance ForallX Show x => Show (DataX x)
deriving instance ForallX Eq x => Eq (KeyValueX x)
deriving instance ForallX Eq x => Eq (DataX x)
deriving instance ForallX Ord x => Ord (KeyValueX x)
deriving instance ForallX Ord x => Ord (DataX x)
And of course everything requires to be exported:
module AST ( KeyValueX(..), DataX(..),
XKV, XKeyValue,
XNull, XNum, XBool, XString, XArray, XObject, XData,
ForallX
) where
This is what is needed in order to create an extension. Even just the "identity" extension (UnDecorated) which needs to be provided.
For every instance you need to implement a typeclass for the type family of every type and data constructor:
data UD -- UnDecorated, identity extension
type instance XKV UD = ()
type instance XKeyValue UD = Void
type instance XData UD = Void
type instance XNull UD = ()
type instance XInt UD = ()
type instance XNum UD = ()
type instance XBool UD = ()
type instance XString UD = ()
type instance XArray UD = ()
type instance XObject UD = ()
Then, in order to do things properly and ergonomic enough for the user, you need patterns and type aliases for every data constructor and data type:
type KeyValue = KeyValueX UD
pattern KV :: String -> Data -> KeyValue
pattern KV x y <- KVX _ x y where KV x y = KVX () x y
type Data = DataX UD
pattern Null :: Data
pattern Null <- NullX _ where Null = NullX ()
pattern DInt :: Int -> Data
pattern DInt x <- IntX _ x where DInt x = IntX () x
pattern DNum :: Double -> Data
pattern DNum x <- NumX _ x where DNum x = NumX () x
pattern DBool :: Bool -> Data
pattern DBool x <- BoolX _ x where DBool x = BoolX () x
pattern DString :: String -> Data
pattern DString x <- StringX _ x where DString x = StringX () x
pattern Array :: [Data] -> Data
pattern Array x <- ArrayX _ x where Array x = ArrayX () x
pattern Object :: [KeyValue] -> Data
pattern Object x <- ObjectX _ x where Object x = ObjectX () x
And of course all this stuff should be exported too:
module AST ( ...,
UD,
KeyValue, Data,
pattern KV,
pattern Null, pattern Num, pattern Bool,
pattern String, pattern Array, pattern Object
) where
TTG turned my simple 10-line module, into a module of more than 100 lines where 90% of the code is boring, hard-to-maintain boilerplate:
I would estimate that the whole language could take a couple dozen types with a total of more than a hundred of data constructors. Then I would need to define a handful of extensions to the AST. A non-extensible AST would take around 100 lines (as an order of magnitude), while one extended via TTG would take around 10,000. All the required boilerplate would make all of this unmanageable for me.
Is there any way to reduce the amount of boilerplate needed, or ideally remove it altogether?
Otherwise are there any alternative ways to make my AST extensible without requiring this much work?
Upvotes: 12
Views: 422
Reputation: 35
There is a project called derive-ttg
https://github.com/HaskellZhangSong/derive-ttg
For a type:
data Lam a = Var String a
| App { t1 :: (Lam a) , t2::(Lam a), ext :: a}
| Abs String (Lam a) a
deriving Eq
there is a function derive_ttg
to generated LamX which can be decorated by the type family
derive_ttg ''Lam []
======>
data LamX (a_aRb :: ghc-prim:GHC.Types.Type) eps_a2vI
= VarX (XVar eps_a2vI) String a_aRb |
AppX {appX :: (XApp eps_a2vI),
t1X :: (LamX eps_a2vI a_aRb),
t2X :: (LamX eps_a2vI a_aRb),
extX :: a_aRb} |
AbsX (XAbs eps_a2vI) String (LamX eps_a2vI a_aRb) a_aRb |
LamX (XLam eps_a2vI)
type family XVar eps
type family XApp eps
type family XAbs eps
type family XLam eps
type ForallXLam (phi :: ghc-prim:GHC.Types.Type
-> Constraint) esp =
(phi (XVar esp), phi (XApp esp), phi (XAbs esp), phi (XLam esp))
derive_simple_decorator
can be used to generate what's needed to decorate a data constructor.
derive_simple_decorator ''Lam "Parse" [('App, ''Bool)] ''()
======>
type LamParse a_a2z3 = LamX a_a2z3
data Parse
type instance XApp Parse = Bool
type instance XVar Parse = ()
type instance XAbs Parse = ()
type instance XLam Parse = ()
derive_simple_decorator ''Lam "TC" [(''Lam, ''Bool)] ''()
======>
type LamTC a_a2Gx = LamX a_a2Gx
data TC
type instance XLam TC = Bool
type instance XVar TC = ()
type instance XApp TC = ()
type instance XAbs TC = ()
For your example:
import Data.Void
data KeyValue = KV String Data deriving (Show, Eq, Ord)
data Data =
Null |
Int Int |
Num Double |
Bool Bool |
String String |
Array [Data] |
Object [KeyValue] deriving (Show, Eq, Ord)
derive_ttg ''KeyValue []
derive_ttg ''Data [''KeyValue]
derive_simple_decorator ''Data "UD" [(''Data, ''Void)] ''()
Will get
derive_ttg ''KeyValue []
======>
data KeyValueX eps_a54X
= KVX (XKV eps_a54X) String Data | KeyValueX (XKeyValue eps_a54X)
type family XKV eps
type family XKeyValue eps
type ForallXKeyValue (phi :: ghc-prim:GHC.Types.Type
-> Constraint) esp =
(phi (XKV esp), phi (XKeyValue esp))
/Users/songzh/project/haskell/derive-ttg/test/Spec.hs:35:1-30: Splicing declarations
derive_ttg ''Data [''KeyValue]
======>
data DataX eps_a56h
= NullX (XNull eps_a56h) |
IntX (XInt eps_a56h) Int |
NumX (XNum eps_a56h) Double |
BoolX (XBool eps_a56h) Bool |
StringX (XString eps_a56h) String |
ArrayX (XArray eps_a56h) [DataX eps_a56h] |
ObjectX (XObject eps_a56h) [KeyValueX eps_a56h] |
DataX (XData eps_a56h)
type family XNull eps
type family XInt eps
type family XNum eps
type family XBool eps
type family XString eps
type family XArray eps
type family XObject eps
type family XData eps
type ForallXData (phi :: ghc-prim:GHC.Types.Type
-> Constraint) esp =
(phi (XNull esp), phi (XInt esp), phi (XNum esp), phi (XBool esp),
phi (XString esp), phi (XArray esp), phi (XObject esp),
phi (XData esp))
derive_simple_decorator ''Data "UD" [(''Data, ''Void)] ''()
======>
type DataUD = DataX
data UD
type instance XData UD = Void
type instance XNull UD = ()
type instance XInt UD = ()
type instance XNum UD = ()
type instance XBool UD = ()
type instance XString UD = ()
type instance XArray UD = ()
type instance XObject UD = ()
The pattern synonym generation has not been implemented yet. I also encounter some problems with deriving class instances. However, I believe it can be solved by derive-topdown
package. A lot of features and details are not settled yet in this package currently. However, I think the boilerplate of TTG can be reduced or removed by a template Haskell library in the end.
Upvotes: 1
Reputation: 33519
You can merge all of the type families into one indexed by a symbol:
data KeyValueX x =
KVX (X "KVX" x) String (DataX x) |
KeyValueX (X "KeyValueX" x)
deriving Generic
data DataX x =
NullX (X "NullX" x) |
IntX (X "IntX" x) Int |
NumX (X "NumX" x) Double |
BoolX (X "BoolX" x) Bool |
StringX (X "StringX" x) String |
ArrayX (X "ArrayX" x) [DataX x] |
ObjectX (X "ObjectX" x) [KeyValueX x] |
DataX (X "DataX" x)
deriving Generic
--
type family X (s :: k) (x :: l) :: Type
Use generics to grab all of the constructor names:
type ForAllX c x = (AllX c (CNames (DataX x)) x, AllX c (CNames (KeyValueX x)) x)
deriving instance ForAllX Eq x => Eq (DataX x)
deriving instance ForAllX Eq x => Eq (KeyValueX x)
-- CNames defined using generics, below
All of the boilerplate up to that point could also be generated from the "base AST" using Template Haskell.
Having only one type family makes it easy to define extensions with catch-all clauses:
data UD
type instance X s UD = XUD s
type family XUD (s :: Symbol) :: Type where
XUD "KeyValueX" = Void
XUD "DataX" = Void
XUD _ = ()
As for the patterns, maybe just exposing the constructors is not so bad? GHC does that.
Imports and generics code to make this answer self-contained:
{-# LANGUAGE
DataKinds,
DeriveGeneric,
PolyKinds,
StandaloneDeriving,
TypeFamilies,
UndecidableInstances #-}
module T where
import Data.Kind (Constraint, Type)
import Data.Void
import GHC.Generics
import GHC.TypeLits
type CNames a = GCNames (Rep a)
type family GCNames (f :: Type -> Type) :: [Symbol] where
GCNames (M1 D c f) = GCNames f
GCNames (f :+: g) = GCNames f ++ GCNames g
GCNames (M1 C (MetaCons name _ _) f) = '[name]
type family (xs :: [k]) ++ (ys :: [k]) :: [k] where
'[] ++ ys = ys
(x ': xs) ++ ys = x ': (xs ++ ys)
type family AllX (c :: Type -> Constraint) (xs :: [Symbol]) (x :: l) :: Constraint where
AllX c '[] x = ()
AllX c (s ': ss) x = (c (X s x), AllX c ss x)
Gist: https://gist.github.com/Lysxia/3f6781b3a307a7e0c564920d6277bee2
Upvotes: 7