Reputation: 2023
Various optimisation problems, like this one, led to Church encoded lists as a way to enable stream fusion, i.e the compiler's elimination of intermediate results (e.g. lists). Here's the definition that was used successfully in the optimisation problem:
{-# LANGUAGE RankNTypes #-}
-- A list encoded as a strict left fold.
newtype ListL a = ListL {build :: forall b. (b -> a -> b) -> b -> b}
Here's how I look at Church-somethings: Instead of asking what an 'something' is, ask what it can do for you. In the case of lists the answer is: Lists can be folded over. In order to fold, I need an 'update' function of type b->a->b
and a starting value of type b
. Then I will give you back the result of the fold, which is of type b
. Hence the definition of ListL
. Here are a few basic operations on ListL
:
mapL :: (a -> a') -> ListL a -> ListL a'
mapL f l = ListL (\f' b' -> build l (\b a -> f' b (f a)) b')
instance Functor ListL where fmap = mapL
fromList :: [a] -> ListL a
fromList l = ListL (\c z -> foldl' c z l)
toList :: ListL a -> [a]
toList l = build l snoc [] where snoc xs x = xs ++ [x]
nullL :: ListL a -> Bool
nullL l = build l (\_ _->False) True
Here's more:
filterL :: (a->Bool) -> ListL a -> ListL a
filterL p l = ListL (\f b->build l (\b' a'->if p a' then f b' a' else b') b)
iterUntil :: (a->Bool) -> a -> (a->a) -> ListL a
iterUntil p a f = ListL (\g b-> snd $ until (p.fst) (\(a',b')->(f a', g b' a')) (a,b))
iterUntil
iterates a function a->a
, starting with some value of type a
, until the predicate a->bool
is fulfilled. A function like Prelude's iterate
isn't possible - at least I don't know how to define it, it would have to be some kind of recursion.
Continuing with examples, length
and sum
are just exercises in choosing the right 'update' function and starting value in a foldl
:
lengthL :: ListL a -> Int
lengthL l = build l (\b _ -> b+1) 0
sumL :: Num a => ListL a -> a
sumL l = build l (+) 0
Now, let's try headL
:
headL :: ListL a -> a
headL l = build l (\_ a->a) _ -- this does not compile!
No matter what starting b
is provided, the first a
should be returned. build l
needs a b
, but we don't have one. This is a weird one: Basically we want to tell the compiler: You don't need the b
, trust me... A headL' :: ListL a -> ListL a
, on the other hand, is easy to construct. An error "empty list!"
in place of the hole _
doesn't work because it always gets called - laziness doesn't seem to take care of this. So, with headL
I'm stuck. Therefore here is
Question 1: How is headL
implemented?
The second issue appears when trying to implement the equivalent of repeatM :: Monad m => m a -> m [a]
. As with iterUntil
, predicate a->Bool
is needed to stop the iteration:
iterUntilM :: Monad m => (a->Bool) -> m a -> m (ListL a)
The purpose is clear: Repeat a monadic action m a
until a->Bool
is satisfied. The idea is, of course, to fold this ListL a
right away and achieve stream fusion (list fusion). For example:
import System.Random (randomIO)
main :: IO ()
main = do
rs <- iterUntilM (>42::Int) randomIO
print $ lengthL rs
The example is rather contrived, it prints the number of draws it took until the first number >42 was found. In a more realistic setting the monad m
is, for example, an ST s
monad that wraps some FFI. The point is: This should run efficiently. I'm thoroughly stuck with this one. How do I entangle the (>>=) :: m a -> (a->m b) -> m b
with build
to get a m (ListL a)
? I.e. this is
Question 2: How is iterUntilM
implemented?
Other than being a good learning exercise, is this actually a good idea?
Upvotes: 3
Views: 888
Reputation: 10091
In general, when you remove assumptions about a type, the function you write will not only be more general (in terms of what types it can use), it will also be more specific about what exactly it's doing. That's what's happening with the church-encoding allowing fusion: when lists are represented as
data [a] = [] | a : [a]
There are countless ways to use those in a function, only one of them being foldr
. However, when you have:
newtype List a = { runList :: forall b. (a -> b -> b) -> b -> b }
The only way to use that type is through foldr
. This is what lets you do the optimisations we know and love. Stream fusion is just one of them, by the way: you also get O(1) append, for instance.
Your type is more constrained still: it tells us that the underlying list can't be (meaningfully) infinite.
There's another constrained representation of lists that shifts the focus:
data List a = forall b. List b (b -> Maybe (a, b))
Where the church-encoded list is a consumer, this is a producer. It says nothing about how the list can be consumed, but an awful lot about how it can be made.
So we've seen that we get a lot from these constrained representations, what do we lose? tail
is a good example. For the producer:
tail (List x f) = case f x of
Just (_,xs) -> List xs f
And for the consumer:
tail xs =
List (\c n ->
runList xs
(\h t g -> g h (t c))
(const n)
(const id))
The consumer's implementation is O(n), whereas the producer's is obviously O(1).
Both of these types can allow for fusion, but some functions can be more efficiently implemented in one than the other. GHC happened to choose the former representation as its basis for fusion, but there's nothing fundamental that makes that choice superior: most functions Haskellers were using just seemed to work better in the foldr/build
pattern of fusion than the other one. In other places, the unfolding pattern is used.
That preamble out of the way, there are two questions we need to ask:
head
and iterUntilM
) work efficiently in only the foldr
representation (like append), or in the unfoldr
representation (like tail
), or both (like map
)?foldr
?), or can it be constrained even more?head
can be implemented on foldr
-encoded lists pretty easily:
head xs = runList xs const (error "head: empty list")
On foldl'
-lists, it's a little more complicated:
head xs =
fromMaybe
(error "head: empty list")
(build xs (\xs x -> maybe (Just x) Just xs) Nothing)
You'll notice that this function (like tail
on foldr
-lists) is O(n). It also doesn't work for infinite lists. This is a good indication that foldl'
isn't the right choice for fusing head
.
Now, for iterUntilM
, we see a case where (I don't think) even fusion is possible. Because the m
ends up on the outside, you have to run all of the effects in the list (materialising it).
For a good overview of this area, check out this blog post.
Upvotes: 5