Mike Izbicki
Mike Izbicki

Reputation: 6366

Can a thunk be duplicated to improve memory performance?

One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.

Let's create a really big list:

let xs = [1..10000000]

Now, let's create a bad function:

bad = do
    print $ foldl1' (+) xs
    print $ length xs

With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.

Is it possible to reimplement this function something like this:

good = do
    (xs1,xs2) <- copyThunk xs
    print $ foldl1' (+) xs1
    print $ length xs2

Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)

Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.

Upvotes: 26

Views: 920

Answers (3)

Joachim Breitner
Joachim Breitner

Reputation: 25782

I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.

To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.

Upvotes: 20

Tarrasch
Tarrasch

Reputation: 10557

Interesting question. I don't know how to implement copyThunk. But there is something else you can do (sorry if you already knew this):

xsFunction :: () -> [Int]
xsFunction = const [1..10000000]

better = do
  print $ foldl1' (+) $ xsFunction ()
  print $ length $ xsFunction ()

Here it definitely won't put the expression xsFunction () in a thunk, it will be calculated twice thus not making any memory bloat.


An interesting follow up on this is:

  • Can one ever implement copyThunk?
  • Should a haskell programmer ever be messing around with this relatively low level optimizations? Can't we assume ghc to outsmart us on this?

Upvotes: 4

ertes
ertes

Reputation: 29

Turn xs into a function. This may be ugly, but works, because it prevents sharing:

let xs () = [1..1000000]

good = do
    print $ foldl1' (+) (xs ())
    print $ length (xs ())

Upvotes: 2

Related Questions