ryachza
ryachza

Reputation: 4540

Monadic Fold in Constant Space

How can I fold a lazy list using a monadic action in constant space? The problem I'm trying to solve is aggregating a large file, and I believe for the sake of performance I require mutability. I have an implementation working in ST using mutable vectors, but it uses too much memory. Below is an example of what I'm attempting. I also experimented briefly with Conduit but that didn't appear to provide any improvement.

ST forM_:

import Control.Monad (forM_)
import Control.Monad.ST.Trans as STT
import Control.Monad.Identity as Identity

testST :: Int
testST = do
  Identity.runIdentity $ STT.runST $ do
    a <- STT.newSTRef 0
    forM_ [1..10000000] (\x -> do
        a' <- STT.readSTRef a
        STT.writeSTRef a (a' + x)
      )
    STT.readSTRef a

Conduit:

import Data.Conduit (($=),(=$),($$))
import qualified Data.Conduit as C
import qualified Data.Conduit.List as CL

testCL :: IO Int
testCL = CL.sourceList [1..10000000] $$ CL.foldM (\a x -> return (a + x)) 0

Upvotes: 9

Views: 155

Answers (1)

Daniel Wagner
Daniel Wagner

Reputation: 152867

The problem is not with the fold, but with the fold body. This program allocates a lot:

testST = runST $ do
    ref <- newSTRef 0
    forM_ [1..10000000] $ \x -> do
         val <- readSTRef ref
         writeSTRef ref (val + x)
    readSTRef ref

This program, whose only difference is on the writeSTRef line, allocates almost nothing:

testST = runST $ do
    ref <- newSTRef 0
    forM_ [1..10000000] $ \x -> do
        val <- readSTRef ref
        writeSTRef ref $! val + x
    readSTRef ref

The difference between the two pieces of code is a good hint to what's going on: in the former, you are creating a reference to a deeply-nested thunk with 10000000 layers of applications of +; whereas the latter flattens the thunk at each step.

By the way, this common pitfall is explicitly called out in the documentation for modifySTRef.

Upvotes: 15

Related Questions