Noah Williams
Noah Williams

Reputation: 53

How do I get rid of this memory leak in Haskell?

I'm trying to find out why the following code has a memory leak:

module Main where

import System.IO

func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
    let (rest, ret) = func (input + 1) (numTimes - 1)
    ((input : rest), ret)

main :: IO ()
main = do
    hSetBuffering stdout LineBuffering
    let (list, final) = func 0 10000000000
        listStr = map (\x -> (show x) ++ "\n") list
    putStr (foldr (++) "" listStr)
    putStr (show final)

printStrs :: [String] -> String -> IO ()
printStrs [] str = do
    putStrLn str
printStrs (first : rest) str = do
    putStr first
    printStrs rest str

When I compile it with ghc --make Main and run it, the top command shows it eating up more and more memory even though the amount of memory it uses should be constant because of lazy evaluation. I've tried using the printStrs function I wrote instead, and it still eats up all memory. I've tried using ghci on the code and using :sprint to print out the thunks from func and it seems like the thunks aren't increasing the amount of memory used for each evaluation of an element in the list.

I honestly don't know what else to do.

Upvotes: 1

Views: 488

Answers (2)

danidiaz
danidiaz

Reputation: 27776

You mention in a comment that

I just want to get intermediate values so I can have some idea of how much progress the program is making and then take the final value as a separate return value

Let's try defining a special-purpose datatype which models the idea of "inspect some bit of progress, or get hold of the final result, if we have finished". Something like

{-# LANGUAGE DeriveFunctor #-}
data Progress a r = Emit a (Progress a r) | Result r
    deriving Functor -- maps over the result value r, not over the as

Notice that, unlike ([Int], Int), Progress doesn't give us "direct" access to the final result until we have gone trough all the nested Emit constructors. Hopefully this will help us avoid unexpected dependencies between thunks.

Now let's define func like this:

{-# LANGUAGE BangPatterns #-}
func :: Int -> Int -> Progress Int Int
func input 0 = 
    Result input
-- the bang avoids the accumulation of thunks behind the input param
func !input numTimes = 
    Emit input (func (input + 1) (numTimes - 1))

Notice that we don't need to go through all the recursive calls to get hold of the first progress "notification". Even if input is 10000000000, we can pattern-match on the outermost Emit constructor after the first iteration!

The disadvantage of the Progress a r datatype is that we can't easily use regular list functions to print the progress. But we can define our own:

printProgress :: Show a => Progress a r -> IO r
printProgress (Result r) = 
    pure r
printProgress (Emit a rest) = 
    do print a
       printProgress rest

In practice, we often also want to be able to perform monadic effects at each "step". At that point, it's common to turn to some streaming library like streaming. If you squint a little, the Stream type from "streaming" and similar libraries is basically an effectful list which returns a special result after reaching the end.

Upvotes: 1

Philipp Claßen
Philipp Claßen

Reputation: 44019

The problem is that func will build a huge list and laziness will not be able to avoid it. It reminds me of continuation passing where the order of computations are sequentialized.

I think, the part with foldr is responsible for the memory consumption. By avoiding it and compiling it with ghc -O3, the memory usage is constant in my test:

module Main where

import System.IO

func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
    let (rest, ret) = func (input + 1) (numTimes - 1)
    ((input : rest), ret)

main :: IO ()
main = do
    hSetBuffering stdout LineBuffering
    let (list, final) = func 0 10000000000
    mapM_ (putStrLn . show) list
    putStr (show final)

In ghci, it still blows the memory. But it might be because the interpreter is not able to optimize the recursion away.

Upvotes: 1

Related Questions