nponeccop
nponeccop

Reputation: 13677

Expression expansion using recursion schemes

I have a data type representing arithmetic expressions:

data E = Add E E | Mul E E | Var String

I want to write an expansion function which will convert an expression into sum of products of variables (sort of braces expansion). Using recursion schemes of course.

I only could think of an algorithm in the spirit of "progress and preservation". The algorithm at each step constructs terms that are fully expanded so there is no need to re-check.

The handling of Mul made me crazy, so instead of doing it directly I used an isomorphic type of [[String]] and took advantage of concat and concatMap already implemented for me:

type Poly = [Mono]
type Mono = [String]

mulMonoBy :: Mono -> Poly -> Poly
mulMonoBy x = map (x ++)

mulPoly :: Poly -> Poly -> Poly
mulPoly x = concatMap (flip mulMonoBy x)

So then I just use cata:

expandList :: E -> Poly
expandList = cata $ \case
   Var x -> [[x]]
   Add e1 e2 = e1 ++ e2
   Mul e1 e2 = mulPoly e1 e2

And convert back:

fromPoly :: Poly -> Expr
fromPoly = foldr1 Add . map fromMono where
   fromMono = foldr1 Mul . map Var

Are there significantly better approaches?

Upd: There are few confusions.

  1. The solution does allow multiline variable names. Add (Val "foo" (Mul (Val "foo) (Var "bar"))) is a representation of foo + foo * bar. I'm not representing x*y*z with Val "xyz" or something. Note that also as there are no scalars repeated vars such as "foo * foo * quux" are perfectly allowed.

  2. By sum of products I mean sort of "curried" n-ary sum of products. A concise definition of sum of products is that I want an expression without any parentheses, with all parens represented by associativity and priority.

So (foo * bar + bar) + (foo * bar + bar) is not a sum of products as the because of middle + is sum of sums

(foo * bar + (bar + (foo * bar + bar))) or corresponding left-associative version are right answers, although we must guarantee that associativity is always left of always right. So the correct type for right-assoaciative solution is

data Poly = Sum Mono Poly
          | Product Mono

which is isomorphic to nonempty lists: NonEmpty Poly (note Sum Mono Poly instead of Sum Poly Poly). If we allow empty sums or products then we get just the list of list representation I used.

  1. Also of you don't care about performance, the multiplication seems to be just liftA2 (++)

Upvotes: 3

Views: 646

Answers (2)

duplode
duplode

Reputation: 34378

This answer has three sections. The first section, a summary in which I present my two favourite solutions, is the most important one. The second section contains types and imports, as well as extended commentary on the way towards the solutions. The third section focuses on the task of reassociating expressions, something that the original version of the answer (i.e. the second section) had not given due attention.

At the end of the day, I ended up with two solutions worth discussing. The first one is expandDirect (cf. the third section):

expandDirect :: E a -> E a
expandDirect = cata alg
    where
    alg = \case
        Var' s -> Var s
        Add' x y -> apo coalgAdd (Add x y)
        Mul' x y -> (apo coalgAdd' . apo coalgMul) (Mul x y)
    coalgAdd = \case
        Add (Add x x') y -> Add' (Left x) (Right (Add x' y))
        x -> Left <$> project x
    coalgAdd' = \case
        Add (Add x x') y -> Add' (Left x) (Right (Add x' y))
        Add x (Add y y') -> Add' (Left x) (Right (Add y y'))
        x -> Left <$> project x
    coalgMul = \case
        Mul (Add x x') y -> Add' (Right (Mul x y)) (Right (Mul x' y))
        Mul x (Add y y') -> Add' (Right (Mul x y)) (Right (Mul x y'))
        x -> Left <$> project x

With it, we rebuild the tree from the bottom (cata). On every branch, if we find something invalid we walk back and rewrite the subtree (apo), redistributing and reassociating as needed until all immediate children are correctly arranged (apo makes it possible to do that without having to rewrite everyting down to the very bottom).

The second solution, expandMeta, is a much simplified version of expandFlat from the third section.

expandMeta :: E a -> E a
expandMeta = apo coalg . cata alg
    where
    alg = \case
        Var' s -> pure (Var s)
        Add' x y -> x <> y
        Mul' x y -> Mul <$> x <*> y
    coalg = \case
        x :| [] -> Left <$> project x
        x :| (y:ys) -> Add' (Left x) (Right (y :| ys))

expandMeta is a metamorphism; that is, a catamorphism followed by an anamorphism (while we are using apo here as well, an apomorphism is just a fancy kind of anamorphism, so I guess the nomenclature still applies). The catamorphism changes the tree into a non-empty list -- that implicitly handles the reassociation of the Adds -- with the list applicative being used to distribute multiplication (much like you suggest). The coalgebra then quite trivially converts the non-empty list back into a tree with the appropriate shape.


Thank you for the question -- I had a lot of fun with it! Preliminaries:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Data.Functor.Foldable
import qualified Data.List.NonEmpty as N
import Data.List.NonEmpty (NonEmpty(..))
import Data.Semigroup
import Data.Foldable (toList)
import Data.List (nub)
import qualified Data.Map as M
import Data.Map (Map, (!))
import Test.QuickCheck

data E a = Var a | Add (E a) (E a) | Mul (E a) (E a)
    deriving (Eq, Show, Functor, Foldable)

data EF a b = Var' a | Add' b b | Mul' b b
    deriving (Eq, Show, Functor)

type instance Base (E a) = EF a

instance Recursive (E a) where
    project = \case
        Var x -> Var' x
        Add x y -> Add' x y
        Mul x y -> Mul' x y

instance Corecursive (E a) where
    embed = \case
        Var' x -> Var x
        Add' x y -> Add x y
        Mul' x y -> Mul x y

To begin with, my first working (if flawed) attempt, which uses the applicative instance of (non-empty) lists to distribute:

expandTooClever :: E a -> E a
expandTooClever = cata $ \case
    Var' s -> Var s
    Add' x y -> Add x y
    Mul' x y -> foldr1 Add (Mul <$> flatten x <*> flatten y)
    where
    flatten :: E a -> NonEmpty (E a)
    flatten = cata $ \case
        Var' s -> pure (Var s)
        Add' x y -> x <> y
        Mul' x y -> pure (foldr1 Mul (x <> y))

expandTooClever has one relatively serious problem: as it calls flatten, a full-blown fold, for both subtrees whenever it reaches a Mul, it has horrible asymptotics for chains of Mul.

Brute force, simplest-thing-that-could-possibly-work solution, with an algebra that calls itself recursively:

expandBrute :: E a -> E a
expandBrute = cata alg
    where
    alg = \case
        Var' s -> Var s
        Add' x y -> Add x y
        Mul' (Add x x') y -> Add (alg (Mul' x y)) (alg (Mul' x' y))
        Mul' x (Add y y') -> Add (alg (Mul' x y)) (alg (Mul' x y'))
        Mul' x y -> Mul x y

The recursive calls are needed because the distribution might introduce new occurrences of Add under Mul.

A slightly more tasteful variant of expandBrute, with the recursive call factored out into a separate function:

expandNotSoBrute :: E a -> E a
expandNotSoBrute = cata alg
    where
    alg = \case
        Var' s -> Var s
        Add' x y -> Add x y
        Mul' x y -> dis x y
    dis (Add x x') y = Add (dis x y) (dis x' y)
    dis x (Add y y') = Add (dis x y) (dis x y')
    dis x y = Mul x y

A tamed expandNotSoBrute, with dis being turned into an apomorphism. This way of phrasing it expresses nicely the big picture of what is going on: if you only have Vars and Adds, you can trivially reproduce the tree bottom-up without a care in the world; if you hit a Mul, however, you have to go back and reconstuct the whole subtree to perform the distributions (I wonder is there is a specialised recursion scheme that captures this pattern).

expandEvert :: E a -> E a
expandEvert = cata alg
    where
    alg :: EF a (E a) -> E a
    alg = \case
        Var' s -> Var s
        Add' x y -> Add x y
        Mul' x y -> apo coalg (x, y)
    coalg :: (E a, E a) -> EF a (Either (E a) (E a, E a))
    coalg (Add x x', y) = Add' (Right (x, y)) (Right (x', y))
    coalg (x, Add y y') = Add' (Right (x, y)) (Right (x, y'))
    coalg (x, y) = Mul' (Left x) (Left y)

apo is necessary because we want to anticipate the final result if there is nothing else to distribute. (There is a way to write it with ana; however, that requires wastefully rebuilding trees of Muls without changes, which leads to the same asymptotics problem expandTooClever had.)

Last, but not least, a solution which is both a successful realisation of what I had attempted with expandTooClever and my interpretation of amalloy's answer. BT is a garden-variety binary tree with values on the leaves. A product is represented by a BT a, while a sum of products is a tree of trees.

expandSOP :: E a -> E a
expandSOP = cata algS . fmap (cata algP) . cata algSOP
    where
    algSOP :: EF a (BT (BT a)) -> BT (BT a)
    algSOP = \case
        Var' s -> pure (pure s)
        Add' x y -> x <> y
        Mul' x y -> (<>) <$> x <*> y
    algP :: BTF a (E a) -> E a
    algP = \case
        Leaf' s -> Var s
        Branch' x y -> Mul x y
    algS :: BTF (E a) (E a) -> E a
    algS = \case
        Leaf' x -> x
        Branch' x y -> Add x y

BT and its instances:

data BT a = Leaf a | Branch (BT a) (BT a)
    deriving (Eq, Show)

data BTF a b = Leaf' a | Branch' b b
    deriving (Eq, Show, Functor)

type instance Base (BT a) = BTF a

instance Recursive (BT a) where
    project (Leaf s) = Leaf' s
    project (Branch l r) = Branch' l r

instance Corecursive (BT a) where
    embed (Leaf' s) = Leaf s
    embed (Branch' l r) = Branch l r

instance Semigroup (BT a) where
    l <> r = Branch l r

-- Writing this, as opposed to deriving it, for the sake of illustration.
instance Functor BT where
    fmap f = cata $ \case
        Leaf' x -> Leaf (f x)
        Branch' l r -> Branch l r

instance Applicative BT where
    pure x = Leaf x
    u <*> v = ana coalg (u, v)
        where
        coalg = \case
            (Leaf f, Leaf x) -> Leaf' (f x)
            (Leaf f, Branch xl xr) -> Branch' (Leaf f, xl) (Leaf f, xr)
            (Branch fl fr, v) -> Branch' (fl, v) (fr, v)

To wrap things up, a test suite:

newtype TestE = TestE { getTestE :: E Char }
    deriving (Eq, Show)

instance Arbitrary TestE where
    arbitrary = TestE <$> sized genExpr
        where
        genVar = Var <$> choose ('a', 'z')
        genAdd n = Add <$> genSub n <*> genSub n
        genMul n = Mul <$> genSub n <*> genSub n
        genSub n = genExpr (n `div` 2)
        genExpr = \case
            0 -> genVar
            n -> oneof [genVar, genAdd n, genMul n]

data TestRig b = TestRig (Map Char b) (E Char)
    deriving (Show)

instance Arbitrary b => Arbitrary (TestRig b) where
    arbitrary = do
        e <- genExpr
        d <- genDict e
        return (TestRig d e)
        where
        genExpr = getTestE <$> arbitrary
        genDict x = M.fromList . zip (keys x) <$> (infiniteListOf arbitrary)
        keys = nub . toList

unsafeSubst :: Ord a => Map a b -> E a -> E b
unsafeSubst dict = fmap (dict !)

eval :: Num a => E a -> a
eval = cata $ \case
    Var' x -> x
    Add' x y -> x + y
    Mul' x y -> x * y

evalRig :: (E Char -> E Char) -> TestRig Integer -> Integer
evalRig f (TestRig d e) = eval (unsafeSubst d (f e))

mkPropEval :: (E Char -> E Char) -> TestRig Integer -> Bool
mkPropEval f = (==) <$> evalRig id <*> evalRig f

isDistributed :: E a -> Bool
isDistributed = para $ \case
    Add' (_, x) (_, y) -> x && y
    Mul' (Add _ _, _) _ -> False
    Mul' _ (Add _ _, _) -> False
    Mul' (_, x) (_, y) -> x && y
    _ -> True

mkPropDist :: (E Char -> E Char) -> TestE -> Bool
mkPropDist f = isDistributed . f . getTestE

main = mapM_ test
    [ ("expandTooClever" , expandTooClever)
    , ("expandBrute"     , expandBrute)
    , ("expandNotSoBrute", expandNotSoBrute)
    , ("expandEvert"     , expandEvert)
    , ("expandSOP"       , expandSOP)
    ]
    where
    test (header, func) = do
        putStrLn $ "Testing: " ++ header
        putStr "Evaluation test:   "
        quickCheck $ mkPropEval func
        putStr "Distribution test: "
        quickCheck $ mkPropDist func

By sum of products I mean sort of "curried" n-ary sum of products. A concise definition of sum of products is that I want an expression without any parentheses, with all parens represented by associativity and priority.

We can adjust the solutions above so that the sums are reassociated. The easiest way is replacing the outer BT in expandSOP with NonEmpty. Given that the multiplication there is, much like you suggest, liftA2 (<>), this works straight away.

expandFlat :: E a -> E a
expandFlat = cata algS . fmap (cata algP) . cata algSOP
    where
    algSOP :: EF a (NonEmpty (BT a)) -> NonEmpty (BT a)
    algSOP = \case
        Var' s -> pure (Leaf s)
        Add' x y -> x <> y
        Mul' x y -> (<>) <$> x <*> y
    algP :: BTF a (E a) -> E a
    algP = \case
        Leaf' s -> Var s
        Branch' x y -> Mul x y
    algS :: NonEmptyF (E a) (E a) -> E a
    algS = \case
        NonEmptyF x Nothing -> x
        NonEmptyF x (Just y) -> Add x y

Another option is using any of the other solutions and reassociating the sums in the distributed tree in a separate step.

flattenSum :: E a -> E a
flattenSum = cata alg
    where
    alg = \case
        Add' x y -> apo coalg (x, y)
        x -> embed x
    coalg = \case
        (Add x x', y) -> Add' (Left x) (Right (x', y))
        (x, y) -> Add' (Left x) (Left y)

We can also roll flattenSum and expandEvert into a single function. Note that the sum coalgebra needs an extra case when it gets the result of the distribution coalgebra. That happens because, as the coalgebra proceeds from top to bottom, we can't be sure that the subtrees it generates are properly associated.

-- This is written in a slightly different style than the previous functions.
expandDirect :: E a -> E a
expandDirect = cata alg
    where
    alg = \case
        Var' s -> Var s
        Add' x y -> apo coalgAdd (Add x y)
        Mul' x y -> (apo coalgAdd' . apo coalgMul) (Mul x y)
    coalgAdd = \case
        Add (Add x x') y -> Add' (Left x) (Right (Add x' y))
        x -> Left <$> project x
    coalgAdd' = \case
        Add (Add x x') y -> Add' (Left x) (Right (Add x' y))
        Add x (Add y y') -> Add' (Left x) (Right (Add y y'))
        x -> Left <$> project x
    coalgMul = \case
        Mul (Add x x') y -> Add' (Right (Mul x y)) (Right (Mul x' y))
        Mul x (Add y y') -> Add' (Right (Mul x y)) (Right (Mul x y'))
        x -> Left <$> project x

Perhaps there is a more clever way of writing expandDirect, but I haven't figured it out yet.

Upvotes: 1

amalloy
amalloy

Reputation: 91837

I am no expert in recursion schemes, but since it sounds like you are trying to practice them, hopefully you will not find it too onerous to convert a solution using manual recursion to one using recursion schemes. I'll write it with mixed prose and code first, and include the complete code again at the end for simpler copy/pasting.

It is not too difficult to do using simply the distributive property and a bit of recursive algebra. Before we begin, though, let's define a better result type, one that guarantees we can only ever represent sums of products:

data Poly term = Sum (Poly term) (Poly term)
               | Product (Mono term) 
               deriving Show

data Mono term = Term term
               | MonoMul (Mono term) (Mono term)
               deriving Show

This way we can't possibly mess up and accidentally yield an incorrect result like

(Mul (Var "x") (Add (Var "y") (Var "z")))

Now, let's write our function.

expand :: E -> Poly String

First, a base case: it is trivial to expand a Var, because it is already in sum-of-products form. But we must convert it a bit to fit it into our Poly result type:

expand (Var x) = Product (Term x)

Next, note that it is easy to expand an addition: simply expand the two sub-expressions, and add them together.

expand (Add x y) = Sum (expand x) (expand y)

What about a multiplication? That is a bit more complicated, since

Product (expand x) (expand y)

is ill-typed: we can't multiply polynomials, only monomials. But we do know how to do algebraic manipulation to turn a multiplication of polynomials into a sum of multiplications of monomials, via the distributive rule. As in your question, we'll need a function mulPoly. But let's just assume that exists, and implement it later.

expand (Mul x y) = mulPoly (expand x) (expand y)

That handles all the cases, so all that's left is to implement mulPoly by distributing the multiplications across the two polynomials' terms. We simply break down one of the polynomials one term at a time, and multiply the term across each of the terms in the other polynomial, adding together the results.

mulPoly :: Poly String -> Poly String -> Poly String
mulPoly (Product x) y = mulMonoBy x y
mulPoly (Sum a b) x = Sum (mulPoly a x) (mulPoly b x)

mulMonoBy :: Mono String -> Poly -> Poly
mulMonoBy x (Product y) = Product $ MonoMul x y
mulMonoBy x (Sum a b) = Sum (mulPoly a x') (mulPoly b x')
  where x' = Product x

And in the end, we can test that it works as intended:

expand (Mul (Add (Var "a") (Var "b")) (Add (Var "y") (Var "z")))
{- results in: Sum (Sum (Product (MonoMul (Term "y") (Term "a"))) 
                        (Product (MonoMul (Term "z") (Term "a")))) 
                   (Sum (Product (MonoMul (Term "y") (Term "b"))) 
                        (Product (MonoMul (Term "z") (Term "b"))))
-}

Or,

(a + b)(y * z) = ay + az + by + bz

which we know to be correct.

The complete solution, as promised above:

data E = Add E E | Mul E E | Var String

data Poly term = Sum (Poly term) (Poly term)
               | Product (Mono term) 
               deriving Show

data Mono term = Term term
               | MonoMul (Mono term) (Mono term)
               deriving Show

expand :: E -> Poly String
expand (Var x) = Product (Term x)
expand (Add x y) = Sum (expand x) (expand y)
expand (Mul x y) = mulPoly (expand x) (expand y)

mulPoly :: Poly String -> Poly String -> Poly String
mulPoly (Product x) y = mulMonoBy x y
mulPoly (Sum a b) x = Sum (mulPoly a x) (mulPoly b x)

mulMonoBy :: Mono String -> Poly String -> Poly String
mulMonoBy x (Product y) = Product $ MonoMul x y
mulMonoBy x (Sum a b) = Sum (mulPoly a x') (mulPoly b x')
  where x' = Product x

main = print $ expand (Mul (Add (Var "a") (Var "b")) (Add (Var "y") (Var "z")))

Upvotes: 1

Related Questions