Reputation: 539
I'm trying to improve performance of this binary-trees benchmark from The Computer Language Benchmark Game. The idea is to build lots of binary trees to benchmark memory allocation. The Tree data definition looks like this:
data Tree = Nil | Node !Int !Tree !Tree
According to the problem statement, there's no need to store an Int
in every node and other languages don't have it.
I use GHC 8.2.2 and get the following RTS report when run the original code:
stack --resolver lts-10.3 --compiler ghc-8.2.2 ghc -- --make -O2 -threaded -rtsopts -funbox-strict-fields -XBangPatterns -fllvm -pgmlo opt-3.9 -pgmlc llc-3.9 binarytrees.hs -o binarytrees.ghc_run
./binarytrees.ghc_run +RTS -N4 -sstderr -K128M -H -RTS 21
...
19,551,302,672 bytes allocated in the heap
7,291,702,272 bytes copied during GC
255,946,744 bytes maximum residency (18 sample(s))
233,480 bytes maximum slop
635 MB total memory in use (0 MB lost due to fragmentation)
...
Total time 58.620s ( 39.281s elapsed)
So far so good. Let's remove this Int, which is actually never used. The definition becomes
data Tree = Nil | Node !Tree !Tree
In theory we are going to save about 25% of total memory (3 integers in every node instead of 4). Let's try it:
...
313,388,960 bytes allocated in the heap
640,488 bytes copied during GC
90,016 bytes maximum residency (2 sample(s))
57,872 bytes maximum slop
5 MB total memory in use (0 MB lost due to fragmentation)
...
Total time 9.596s ( 9.621s elapsed)
5MB total memory in use and almost zero GC? Why? Where did all the allocations go?
Upvotes: 2
Views: 165
Reputation: 539
I believe the sudden memory usage drop caused by the Common Sub-expression Elimination optimization. The original code was:
make i d = Node i (make d d2) (make d2 d2)
-- ^ ^
-- | d2 != d
-- d != d2
Since expressions constructing the left and the right subtrees are different, the compiler is not able eliminate any allocations.
If I remove the unused integer, the code looks like this
make d = Node (make (d - 1)) (make (d - 1))
-- ^ ^
-- | |
-- `--------------`----- identical
If I add the -fno-cse
flag to GHC, the memory allocation is as high as expected, but the code is rather slow. I couldn't find a way to suppress this optimization locally so I decided to "outsmart" the compiler by adding extra unused arguments:
make' :: Int -> Int -> Tree
make' _ 0 = Node Nil Nil
make' !n d = Node (make' (n - 1) (d - 1)) (make' (n + 1) (d - 1))
The trick worked, the memory usage dropped by expected 30%. But I wish there was a nicer way to tell the compiler what I want.
Thanks to @Carl for mentioning the CSE optimization.
Upvotes: 3