Nikolay Artamonov
Nikolay Artamonov

Reputation: 577

Why is `forever` in Haskell implemented this way?

Haskell provides a convenient function forever that repeats a monadic effect indefinitely. It can be defined as follows:

forever :: Monad m => m a -> m b
forever ma = ma >> forever ma

However, in the standard library the function is defined differently:

forever :: Monad m => m a -> m b
forever a = let a' = a *> a' in a'

The let binding is used to force "explicit sharing here, as it prevents a space leak regardless of optimizations" (from the comment on implementation).

Can you explain why the first definition potentially has space leaks?

Upvotes: 10

Views: 355

Answers (1)

Daniel Wagner
Daniel Wagner

Reputation: 153172

The execution engine starts off with a pointer to your loop, and lazily expands it as it needs to find out what IO action to execute next. With your definition of forever, here's what a few iterations of the loop like like in terms of "objects stored in memory":

1.
  PC
   |
   v
forever
   |
   v
  ma

2. 
PC
 |
 v
(>>) --> forever
 |         /
 v L------/
ma

3.
           PC
            |
            v
(>>) --> forever
 |         /
 v  L-----/
ma

4.
         PC
          |
          v
(>>) --> (>>) --> forever
 |        /          /
 v L-----/----------/
ma

5 and 6.
                  PC
                   |
                   v
(>>) --> (>>) --> (>>) --> forever
 |        /        /          /
 v L-----/--------/----------/
ma

The result is that as execution continues, you get more and more copies of (>>) cells. Under normal circumstances, this is no big deal; there's no reference to the first cell, so when a garbage collection happens, the already-executed prefix gets tossed out. But what if we accidentally pass an infinite loop as ma?

1.
  PC
   |
   v
forever
   |
   v
forever
   |
   v
  ma

2.
  PC
   |
   v
  (>>) -> forever
   |         /
   v L------/
forever
   |
   v
  ma

3.
                 return here
                 when done
                     |
                     v
         (>>) --> forever
          |          /
          v L-------/
PC --> forever
          |
          v
         ma

4.
               return here
                   |
                   v
       (>>) --> forever
        |          /
        v L-------/
PC --> (>>) --> forever
        |          /
        v L-------/
       ma

like, 12ish.
       return here
            |
            v
(>>) --> forever
 |          /
 v L-------/
(>>) --> (>>) --> (>>) --> (>>) --> (>>) --> forever <-- PC
 |        /        /        /        /          /
 v L-----/--------/--------/--------/----------/
ma

This time we can't garbage collect the prefix, because one "stack frame" up, we have a pointer to the top-level forever, which still refers to the first (>>)! Whoops. The fancier definition gets around this by making an in-memory cycle. There, forever ma's object looks more like this:

  /----\
 v     |
(*>) --/
 |
 v
ma

Now no extra (*>)'s need to get allocated (nor garbage collected) as execution proceeds -- even if we nest them. The execution pointer will simply move around and around within this graph.

Upvotes: 10

Related Questions