Reputation: 3434
I use the following code to memoize the total stopping time of Collatz function by using a state monad to cache input-result pairs.
Additionally the snd
part of the state is used to keep track of the input value that maximizes the output, and the goal is to find the input value under one million that maximuzes the total stopping time. (The problem can be found on project euler.
import Control.Applicative
import Control.Arrow
import Control.Monad.State
import qualified Data.Map.Strict as M
collatz :: Integer -> Integer
collatz n = if odd n
then 3 * n + 1
else n `div` 2
memoCollatz :: Integer
-> State (M.Map Integer Int, (Integer,Int)) Int
memoCollatz 1 = return 1
memoCollatz n = do
result <- gets (M.lookup n . fst)
case result of
Nothing -> do
l <- succ <$> memoCollatz (collatz n)
let update p@(_,curMaxV) =
if l > curMaxV
then (n,l)
else p
modify (M.insert n l *** update)
return l
Just v -> return v
main :: IO ()
main = print $ snd (execState (mapM_ memoCollatz [1..limit]) (M.empty,(1,1)))
where
limit = 1000000
The program works fine but is really slow. So I want to spend some time figuring out how to make it work faster.
I took a look at the profiling chapter of RWH, but have no clue about what is the problem:
I compiled it using ghc -O2 -rtsopts -prof -auto-all -caf-all -fforce-recomp
, and ran it with +RTS -s -p
and here is the result:
6,633,397,720 bytes allocated in the heap
9,357,527,000 bytes copied during GC
2,616,881,120 bytes maximum residency (15 sample(s))
60,183,944 bytes maximum slop
5274 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10570 colls, 0 par 3.36s 3.36s 0.0003s 0.0013s
Gen 1 15 colls, 0 par 7.03s 7.03s 0.4683s 3.4337s
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.02s ( 4.01s elapsed)
GC time 10.39s ( 10.39s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.16s ( 0.16s elapsed)
Total time 14.57s ( 14.56s elapsed)
%GC time 71.3% (71.3% elapsed)
Alloc rate 1,651,363,842 bytes per MUT second
Productivity 28.7% of total user, 28.7% of total elapsed
And the .prof
file:
total time = 4.08 secs (4080 ticks @ 1000 us, 1 processor)
total alloc = 3,567,324,056 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
memoCollatz Main 84.9 91.9
memoCollatz.update Main 10.5 0.0
main Main 2.4 5.8
collatz Main 2.2 2.3
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 52 0 0.0 0.0 100.0 100.0
main Main 105 0 0.0 0.0 0.0 0.0
CAF:main1 Main 102 0 0.0 0.0 0.0 0.0
main Main 104 1 0.0 0.0 0.0 0.0
CAF:main2 Main 101 0 0.0 0.0 0.0 0.0
main Main 106 0 0.0 0.0 0.0 0.0
CAF:main4 Main 100 0 0.0 0.0 0.0 0.0
main Main 107 0 0.0 0.0 0.0 0.0
CAF:main5 Main 99 0 0.0 0.0 94.4 86.7
main Main 108 0 1.4 0.9 94.4 86.7
memoCollatz Main 113 0 82.4 85.8 92.9 85.8
memoCollatz.update Main 115 2168610 10.5 0.0 10.5 0.0
CAF:main10 Main 98 0 0.0 0.0 5.1 11.0
main Main 109 0 0.4 2.7 5.1 11.0
memoCollatz Main 112 3168610 2.5 6.0 4.7 8.3
collatz Main 114 2168610 2.2 2.3 2.2 2.3
CAF:main11 Main 97 0 0.0 0.0 0.5 2.2
main Main 110 0 0.5 2.2 0.5 2.2
main.limit Main 111 1 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 94 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 89 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 88 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 82 0 0.0 0.0 0.0 0.0
What I can see is that the garbage collector is taking too much time and the program has spent most of its time running memoCollatz
.
And here are two screenshots from heap profiling:
I expect the memory usage to increase and then decrease rapidly because the program is doing memoization using a Map, but not sure what is causing the rapid drop in the graph (maybe this is a bug when visualizing the result?).
I want to know how to analyze these tables / graphs and how they indicates the real problem.
Upvotes: 2
Views: 202
Reputation: 52049
The Haskell Wiki contains a couple of different solutions to this problem: (link)
The fastest solution there uses an Array to memoize the results. On my machine it runs in about 1 second and max. residency is about 35 MB.
Below is a version which runs in about 0.3 seconds and uses 1/4 of the memory of the Array version but it runs in the IO monad.
There are trade-offs between all of the different versions, and you have to decide which one you consider acceptable.
{-# LANGUAGE BangPatterns #-}
import Data.Array.IO
import Data.Array.Unboxed
import Control.Monad
collatz x
| even x = div x 2
| otherwise = 3*x+1
solve n = do
arr <- newArray (1,n) 0 :: IO (IOUArray Int Int)
writeArray arr 1 1
let eval :: Int -> IO Int
eval x = do
if x > n
then fmap (1+) $ eval (collatz x)
else do d <- readArray arr x
if d == 0
then do d <- fmap (1+) $ eval (collatz x)
writeArray arr x d
return d
else return d
go :: (Int,Int) -> Int -> IO (Int,Int)
go !m x = do d <- eval x
return $ max m (d,x)
foldM go (0,0) [2..n]
main = solve 1000000 >>= print
Upvotes: 1