Simple Haskell Performance Analysis

Question

I am trying to compare the perf of two implementations of finding the Nth smallest number in a binary search tree. This is just a toy learning problem. My naïve attempt at measuring follows:

getNth :: Tree Int -> Int -> Either String Int

eval :: [Either b Int] -> Int
eval = (foldl (+) 0) . rights

main :: IO ()
main = do
    let t = foldl treeInsertBalanced EmptyTree [1..1000000]
        first = getNth t 100000
        iterations = 100000000000
        second = take iterations $ repeat $ getNth t 100
        third = take iterations $ repeat $ getNth' t 100

    print $ "dummy to cause eval: " ++ (show first)
    print ""

    time1 <- System.CPUTime.getCPUTime
    print $ eval second
    time2 <- System.CPUTime.getCPUTime
    print $ eval third
    time3 <- System.CPUTime.getCPUTime

    let secondTime = time2-time1
        thirdTime  = time3-time2
        timeDiff   = secondTime - thirdTime
    print $ "take version = " ++ (show secondTime)
    print $ "opt version  = " ++ (show thirdTime)
    print $ "diff         = " ++ (show timeDiff)

I am having a hard time figuring out how to include laziness where I want it, and prevent it where I do not.

I want the tree to be completely constructed before I begin measuring the functions that operate on it. This is why I try to force an evaluation of t by calling getNth on it and then printing it.

Is this doing what I hope it is doing.
Will t remain fully evaluated when I use it subsequently.

The difference in implementation between the two getNth functions is that the first uses the 'take' function on a simple depth first search of the tree. The second does a depth first search with an explicit early return. I want to know whether the simple 'take' implementation has to walk the whole tree or not. How can I determine that in a simpler way than measuring the performance of the two functions. I tried introducing an 'error' or an 'undefined' as a value in the tree, but of course neither was evaluated unless it was the Nth element in the tree. Is there another, simple way of determining whether the getNth function is truly laze or not?

ErikR · Accepted Answer

(Available as a .lhs at http://pastebin.com/jpg0vSNd )

Some observations:

A good way to force evalution of a value is to use deepseq from Control.DeepSeq.
repeat does not re-evaluate it's argument.
GHC is pretty good at spotting expression which are the same, so sometimes you have to disguise your function calls with identical arguments to make GHC re-evalute the function call.

Here is an example of using deepseq:

 import Control.DeepSeq (deepseq)
 import Control.Monad
 import Debug.Trace
 import System.TimeIt
 import System.Environment

 theList = [1..8] ++ [undefined] ++ [10] :: [Int]

 main1 = do
   print $ length theList
   print $ deepseq theList (length theList)

The first print statement emits 10. The second throws an exception because the deepseq call tried to evaluate the undefined element.

To see that repeat does not re-evaluate it's argument, consider this example:

 foo = repeat $ trace "(here)" 2

 main2 = print $ take 3 foo

The result of running main2 is:

[(here)
2,2,2]

What is happening is that when the head of foo is called for repeat evaluate it's argument. This calls trace which prints (here) and returns 2. This value is saved by repeat when the rest of the list foo is needed.

Finally, here is a demonstration of how good GHC is at spotting function calls with identical arguemnts.

 fib :: Int -> Int
 fib 0 = 0
 fib 1 = 1
 fib n = fib (n-1) + fib (n-2)

 theN = 34  -- use 24 if running under ghci

 compute1 = fib theN
 compute2 k = fib theN
 compute3 k = fib (k+theN-k)

fib theN is just a function call which takes a while to compute (about 0.6 secs)

 loop1 n = forM_ [1..n] $ \_ -> print compute1
 loop2 n = forM_ [1..n] $ \k -> print (compute2 k)
 loop3 n = forM_ [1..n] $ \k -> print (compute3 k)

 timeLoop loop = do timeIt $ loop 1
                    timeIt $ loop 2
                    timeIt $ loop 3
                    timeIt $ loop 10

 main4 = timeLoop loop1
 main5 = timeLoop loop2
 main6 = timeLoop loop3

 main = do (arg:_) <- getArgs
           case arg of
             "4" -> main4
             "5" -> main5
             "6" -> main6

The run times depend on whether or not you compile with -O2 or not. Typical reuts are:

         w/o -O2    with -O2
main4     1 secs    0.1 sec
main5    13 secs    0.1 sec
main6    13 secs    1.0 sec

Some conclusions:

A top-level expression like compute1 is memoized.
Adding an ignored parameter (e.g. compute2) will fool GHC into recomputing a function call if -O2 is not used.
With -O2 a trickier way of disguising the function call may be needed to get GHC to re-evaluate it in a loop.

Simple Haskell Performance Analysis

Answers (1)

Related Questions