Benjamin Batistic
Benjamin Batistic

Reputation: 819

Haskell: how to get rid of the memory leak

I was trying "all possible" tricks I came across the internet, with "all possible" combinations of !, seq, deepseq, ... but I could not find the way to suppress the leaking of memory in the following program

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
  where acc = map (\x -> (x, x^2)) z
        go = zipWith (\(a, b) x -> (a + x, b + x^2))
        n  = 1 + (length zs)  
        dn = fromIntegral n
        normalize = map (\(a, b) -> (a / dn, (b - a^2 / dn) / dn))

main =  mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 1000
         mm = 1000

which calculates the mean and the variance of "mm variables from nn measurements".

Can you give me a hint?

EDIT

As pointed out in answers, the problem is in calling length. So the better version could be

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
   where acc = (1, map (\x -> (x, x^2)) z)
         go = \(n, abs) xs -> (n + 1, zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
         normalize (n, abs) = map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs

main =  mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 100
         mm = 1000000

but it still leaks if mm is large because of a thunk created by zipWith. Instead I have tried

zipWith' f xs ys = go f [] xs ys
   where go f zs (a:as) (b:bs) = 
             let zs' = (f a b) : zs in go f zs' as bs
         go _ zs _ _ = foldl' (\x y -> y : x) [] zs

but unsuccessfully.

EDIT

The profiling output for the second improved example and for nn=100 and mm=1e6:

   leak +RTS -p -h -RTS

total time  =        5.18 secs   (5180 ticks @ 1000 us, 1 processor)
total alloc = 4,769,555,408 bytes  (excludes profiling overheads)

    COST CENTRE       MODULE  %time %alloc

    main              Main     67.4   64.0
    main.ps           Main     18.1   21.6
    statistics.go.\   Main      6.7   11.7
    statistics.go.\.\ Main      5.3    1.9
    foldl''           Main      1.5    0.0


                                               individual     inherited
 COST CENTRE                 no.     entries  %time %alloc   %time %alloc

 MAIN                         46           0    0.0    0.0   100.0  100.0
  main                        94           0   67.4   64.0    67.4   64.0
  CAF                         91           0    0.0    0.0    32.6   36.0
   main                       92           1    0.0    0.0    32.6   36.0
    main.mm                   97           1    0.0    0.0     0.0    0.0
    main.nn                   96           1    0.0    0.0     0.0    0.0
    main.ps                   95           1   18.1   21.6    18.1   21.6
    statistics                93           1    0.0    0.0    14.5   14.4
     statistics.normalize    106           1    0.4    0.4     0.8    0.5
      statistics.normalize.\ 107      100000    0.4    0.1     0.4    0.1
     statistics.acc          102           1    0.2    0.3     0.3    0.3
      statistics.acc.\       104      100000    0.1    0.0     0.1    0.0
     statistics.go           100           1    0.0    0.0     0.0    0.0
     foldl''                  98          30    1.5    0.0    13.5   13.6
      foldl''.z'              99          29    0.0    0.0    12.0   13.6
       statistics.go         101           0    0.0    0.0    12.0   13.6
        statistics.go.\      103          29    6.7   11.7    12.0   13.6
         statistics.go.\.\   105     2900000    5.3    1.9     5.3    1.9

enter image description here

It seems, like Thomas M. DuBuisson suggested, that the "leak" is related to the construction of ps, but how else could it be constructed?

Upvotes: 2

Views: 932

Answers (1)

Joachim Breitner
Joachim Breitner

Reputation: 25763

Mostly a guess, but your call to length xs to obtain n forces the spine of the input list xs, creating lots of thunks. For a quick test, use 1000 instead of dn and see if it helps.

For full laziness, you could try to define normalize without having to pre-calculate the length of the list. But that is probably quite tricky to achieve...

In your edited code I believe is no real space leak, just expensive data structures. The code runs much faster and in 133MB of ram after switching to unboxed vectors. Note how I had to change nothing but to add a few V. to functions:

{-# LANGUAGE BangPatterns #-}

import Data.List
import Control.DeepSeq
import qualified Data.Vector.Unboxed as V

foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs 
foldl'' _ z [] = z

statistics :: [V.Vector Double] -> V.Vector (Double, Double)
statistics (z:zs) = normalize $ foldl'' go acc zs
   where acc = (1, V.map (\x -> (x, x^2)) z)
         go = \(n, abs) xs -> (n + 1, V.zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
         normalize (n, abs) = V.map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs

main =  V.mapM_ (putStrLn . show) (statistics ps)
   where ps = take nn $ map V.fromList $ unfoldr (Just . splitAt mm) $ map sin [1..]
         nn = 100
         mm = 1000000

Running this with +RTS -s gives these statistics (I pressed Ctrl-C while the output is running, hence the short execution time):

   3,009,534,792 bytes allocated in the heap
     324,022,224 bytes copied during GC
      51,390,128 bytes maximum residency (16 sample(s))
       4,896,504 bytes maximum slop
             133 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5617 colls,     0 par    0.20s    0.20s     0.0000s    0.0006s
  Gen  1        16 colls,     0 par    0.10s    0.10s     0.0063s    0.0322s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.39s  (  5.59s elapsed)
  GC      time    0.30s  (  0.30s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.70s  (  5.89s elapsed)

  %GC     time      17.8%  (5.1% elapsed)

  Alloc rate    2,163,064,609 bytes per MUT second

  Productivity  82.2% of total user, 23.7% of total elapsed

The maximum residency is roughly 50MB, which corresponds well to three copies of a 1000000 unboxed vectors of pairs of doubles being live at the same time during V.zipWith. The difference between the 50MB of data on the heap and the 133MB of memory in use comes from the fact that we have a copying garbage collector (which double the demand) and probably some overhead from the run time system or other components.

Upvotes: 3

Related Questions