Avoiding code duplication for data type with lots of similar constructors

Question

I'm working on a writing simple parser in Haskell and have this datatype which holds the results of the parse.

data AST = Imm Integer
    | ArgName String
    | Arg Integer
    | Add AST AST
    | Sub AST AST
    | Mul AST AST
    | Div AST AST
    deriving (Show, Eq)

The problem comes when I want to map over the tree to replace variable names with its reference number using a map. I have to write this code

refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = case d M.!? s of
                            Just n -> Just (Arg n)
                            Nothing -> Nothing
refVars _ (Imm n)     = Just $ Imm n
refVars _ (Arg n)     = Just $ Arg n                       
refVars d (Add a1 a2) = Add <$> refVars d a1 <*> refVars d a2
refVars d (Sub a1 a2) = Sub <$> refVars d a1 <*> refVars d a2
refVars d (Mul a1 a2) = Mul <$> refVars d a1 <*> refVars d a2
refVars d (Div a1 a2) = Div <$> refVars d a1 <*> refVars d a2

Which seems incredibly redundant. Ideally I'd want to have one pattern which matches any (op a1 a2) but Haskell won't allow that. Any suggestions?

Benjamin Hodgson · Accepted Answer

As proposed in the comments, the fix for your immediate problem is to move the information about the operator type out of the constructor:

data Op = Add | Sub | Mul | Div
data AST = Imm Integer
    | ArgName String
    | Arg Integer
    | Op Op AST AST

This datatype has one constructor for all of the binary operations, so you only need one line to take it apart:

refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s)   = Arg <$> d !? s
refVars _ (Imm n)       = Just $ Imm n
refVars _ (Arg n)       = Just $ Arg n                       
refVars d (Op op a1 a2) = Op op <$> refVars d a1 <*> refVars d a2

You can handle all different types of binary operators without modifying refVars, but if you add different syntactic forms to your AST you'll have to add clauses to refVars.

data AST = -- other constructors as before
    | Ternary AST AST AST
    | List [AST]
    | Call AST [AST]  -- function and args

refVars -- other clauses as before
refVars d (Ternary cond tt ff) = Ternary <$> refVars d cond <*> refVars d tt <*> refVars d ff
refVars d (List l) = List <$> traverse (refVars d) l
refVars d (Call f args) = Call <$> refVars d f <*> traverse (refVars d) args

So it's still tedious - all this code does is traverse the tree to the leaves, whereupon refVars can scrutinise whether the leaf is an ArgName or otherwise. The interesting part of refVars is the one ArgName line; the remaining six lines of the function are pure boilerplate.

It'd be nice if we could define "traverse the tree" separately from "handle ArgNames". This is where generic programming comes in. There are lots of generic programming libraries out there, each with its own style and approach, but I'll demonstrate using lens.

The Control.Lens.Plated module defines a Plated class for types which know how to access their children. The deal is: you show lens how to access your children (by passing them to a callback g), and lens can recursively apply that to access the children's children and so on.

instance Plated AST where
    plate g (Op op a1 a2) = Op op <$> g a1 <*> g a2
    plate g (Ternary cond tt ff) = Ternary <$> g cond <*> g tt <*> g ff
    plate g (List l) = List <$> traverse g l
    plate g (Call f args) = Call <$> g f <*> traverse g args
    plate _ a = pure a

Aside: you might object that even writing plate clause-by-clause is rather too much boilerplate. The compiler should be able to locate the AST's children for you. lens has your back — there's a default implementation of plate for any type which is an instance of Data, so you should be able to slap deriving Data onto AST and leave the Plated instance empty.

Now we can implement refVars using transformM :: (Monad m, Plated a) => (a -> m a) -> a -> m a.

refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d = transformM $ \case
    ArgName s -> Arg <$> d !? s
    x -> Just x

transformM takes a (monadic) transformation function and applies that to every descendant of the AST. Our transformation function searches for ArgName nodes and replaces them with Arg nodes, leaving any non-ArgNames unchanged.

For a more detailed explanation, see this paper (or the accompanying slides, if you prefer) by Neil Mitchell. It's what the Plated module is based on.

Avoiding code duplication for data type with lots of similar constructors

Answers (2)

Related Questions