Reputation: 819
I was trying "all possible" tricks I came across the internet, with "all possible" combinations of !
, seq
, deepseq
, ... but I could not find the way to suppress the leaking of memory in the following program
{-# LANGUAGE BangPatterns #-}
import Data.List
import Control.DeepSeq
foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs
foldl'' _ z [] = z
statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
where acc = map (\x -> (x, x^2)) z
go = zipWith (\(a, b) x -> (a + x, b + x^2))
n = 1 + (length zs)
dn = fromIntegral n
normalize = map (\(a, b) -> (a / dn, (b - a^2 / dn) / dn))
main = mapM_ (putStrLn . show) (statistics ps)
where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
nn = 1000
mm = 1000
which calculates the mean and the variance of "mm
variables from nn
measurements".
Can you give me a hint?
EDIT
As pointed out in answers, the problem is in calling length
. So the better version could be
{-# LANGUAGE BangPatterns #-}
import Data.List
import Control.DeepSeq
foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs
foldl'' _ z [] = z
statistics :: [[Double]] -> [(Double, Double)]
statistics (z:zs) = normalize $ foldl'' go acc zs
where acc = (1, map (\x -> (x, x^2)) z)
go = \(n, abs) xs -> (n + 1, zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
normalize (n, abs) = map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs
main = mapM_ (putStrLn . show) (statistics ps)
where ps = take nn $ unfoldr (Just . splitAt mm) $ map sin [1..]
nn = 100
mm = 1000000
but it still leaks if mm
is large because of a thunk created by zipWith
. Instead I have tried
zipWith' f xs ys = go f [] xs ys
where go f zs (a:as) (b:bs) =
let zs' = (f a b) : zs in go f zs' as bs
go _ zs _ _ = foldl' (\x y -> y : x) [] zs
but unsuccessfully.
EDIT
The profiling output for the second improved example and for nn=100
and mm=1e6
:
leak +RTS -p -h -RTS
total time = 5.18 secs (5180 ticks @ 1000 us, 1 processor)
total alloc = 4,769,555,408 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
main Main 67.4 64.0
main.ps Main 18.1 21.6
statistics.go.\ Main 6.7 11.7
statistics.go.\.\ Main 5.3 1.9
foldl'' Main 1.5 0.0
individual inherited
COST CENTRE no. entries %time %alloc %time %alloc
MAIN 46 0 0.0 0.0 100.0 100.0
main 94 0 67.4 64.0 67.4 64.0
CAF 91 0 0.0 0.0 32.6 36.0
main 92 1 0.0 0.0 32.6 36.0
main.mm 97 1 0.0 0.0 0.0 0.0
main.nn 96 1 0.0 0.0 0.0 0.0
main.ps 95 1 18.1 21.6 18.1 21.6
statistics 93 1 0.0 0.0 14.5 14.4
statistics.normalize 106 1 0.4 0.4 0.8 0.5
statistics.normalize.\ 107 100000 0.4 0.1 0.4 0.1
statistics.acc 102 1 0.2 0.3 0.3 0.3
statistics.acc.\ 104 100000 0.1 0.0 0.1 0.0
statistics.go 100 1 0.0 0.0 0.0 0.0
foldl'' 98 30 1.5 0.0 13.5 13.6
foldl''.z' 99 29 0.0 0.0 12.0 13.6
statistics.go 101 0 0.0 0.0 12.0 13.6
statistics.go.\ 103 29 6.7 11.7 12.0 13.6
statistics.go.\.\ 105 2900000 5.3 1.9 5.3 1.9
It seems, like Thomas M. DuBuisson suggested, that the "leak" is related to the construction of ps
, but how else could it be constructed?
Upvotes: 2
Views: 932
Reputation: 25763
Mostly a guess, but your call to length xs
to obtain n
forces the spine of the input list xs
, creating lots of thunks. For a quick test, use 1000
instead of dn
and see if it helps.
For full laziness, you could try to define normalize without having to pre-calculate the length of the list. But that is probably quite tricky to achieve...
In your edited code I believe is no real space leak, just expensive data structures. The code runs much faster and in 133MB of ram after switching to unboxed vectors. Note how I had to change nothing but to add a few V.
to functions:
{-# LANGUAGE BangPatterns #-}
import Data.List
import Control.DeepSeq
import qualified Data.Vector.Unboxed as V
foldl'' f z (x:xs) = let z' = f z x in z' `deepseq` foldl'' f z' xs
foldl'' _ z [] = z
statistics :: [V.Vector Double] -> V.Vector (Double, Double)
statistics (z:zs) = normalize $ foldl'' go acc zs
where acc = (1, V.map (\x -> (x, x^2)) z)
go = \(n, abs) xs -> (n + 1, V.zipWith (\(a, b) x -> (a + x, b + x^2)) abs xs)
normalize (n, abs) = V.map (\(a, b) -> (a / n, (b - a^2 / n) / n)) abs
main = V.mapM_ (putStrLn . show) (statistics ps)
where ps = take nn $ map V.fromList $ unfoldr (Just . splitAt mm) $ map sin [1..]
nn = 100
mm = 1000000
Running this with +RTS -s
gives these statistics (I pressed Ctrl-C while the output is running, hence the short execution time):
3,009,534,792 bytes allocated in the heap
324,022,224 bytes copied during GC
51,390,128 bytes maximum residency (16 sample(s))
4,896,504 bytes maximum slop
133 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 5617 colls, 0 par 0.20s 0.20s 0.0000s 0.0006s
Gen 1 16 colls, 0 par 0.10s 0.10s 0.0063s 0.0322s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.39s ( 5.59s elapsed)
GC time 0.30s ( 0.30s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 1.70s ( 5.89s elapsed)
%GC time 17.8% (5.1% elapsed)
Alloc rate 2,163,064,609 bytes per MUT second
Productivity 82.2% of total user, 23.7% of total elapsed
The maximum residency is roughly 50MB, which corresponds well to three copies of a 1000000 unboxed vectors of pairs of doubles being live at the same time during V.zipWith
. The difference between the 50MB of data on the heap and the 133MB of memory in use comes from the fact that we have a copying garbage collector (which double the demand) and probably some overhead from the run time system or other components.
Upvotes: 3