Reputation: 53
I'm trying to find out why the following code has a memory leak:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
listStr = map (\x -> (show x) ++ "\n") list
putStr (foldr (++) "" listStr)
putStr (show final)
printStrs :: [String] -> String -> IO ()
printStrs [] str = do
putStrLn str
printStrs (first : rest) str = do
putStr first
printStrs rest str
When I compile it with ghc --make Main
and run it, the top command shows it eating up more and more memory even though the amount of memory it uses should be constant because of lazy evaluation. I've tried using the printStrs function I wrote instead, and it still eats up all memory. I've tried using ghci on the code and using :sprint to print out the thunks from func and it seems like the thunks aren't increasing the amount of memory used for each evaluation of an element in the list.
I honestly don't know what else to do.
Upvotes: 1
Views: 488
Reputation: 27776
You mention in a comment that
I just want to get intermediate values so I can have some idea of how much progress the program is making and then take the final value as a separate return value
Let's try defining a special-purpose datatype which models the idea of "inspect some bit of progress, or get hold of the final result, if we have finished". Something like
{-# LANGUAGE DeriveFunctor #-}
data Progress a r = Emit a (Progress a r) | Result r
deriving Functor -- maps over the result value r, not over the as
Notice that, unlike ([Int], Int)
, Progress
doesn't give us "direct" access to the final result until we have gone trough all the nested Emit
constructors. Hopefully this will help us avoid unexpected dependencies between thunks.
Now let's define func
like this:
{-# LANGUAGE BangPatterns #-}
func :: Int -> Int -> Progress Int Int
func input 0 =
Result input
-- the bang avoids the accumulation of thunks behind the input param
func !input numTimes =
Emit input (func (input + 1) (numTimes - 1))
Notice that we don't need to go through all the recursive calls to get hold of the first progress "notification". Even if input
is 10000000000
, we can pattern-match on the outermost Emit
constructor after the first iteration!
The disadvantage of the Progress a r
datatype is that we can't easily use regular list functions to print the progress. But we can define our own:
printProgress :: Show a => Progress a r -> IO r
printProgress (Result r) =
pure r
printProgress (Emit a rest) =
do print a
printProgress rest
In practice, we often also want to be able to perform monadic effects at each "step". At that point, it's common to turn to some streaming library like streaming. If you squint a little, the Stream
type from "streaming" and similar libraries is basically an effectful list which returns a special result after reaching the end.
Upvotes: 1
Reputation: 44019
The problem is that func
will build a huge list and laziness will not be able to avoid it. It reminds me of continuation passing where the order of computations are sequentialized.
I think, the part with foldr
is responsible for the memory consumption. By avoiding it and compiling it with ghc -O3
, the memory usage is constant in my test:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
mapM_ (putStrLn . show) list
putStr (show final)
In ghci, it still blows the memory. But it might be because the interpreter is not able to optimize the recursion away.
Upvotes: 1