Torinthiel
Torinthiel

Reputation: 1020

Why is this code not constant-space?

I'm learning Haskell currently (being a programmer by trade, but this is my first attempt at a functional language).

I want to write a function that scans a list and returns both the minimum and maximum element of that list. Sort of what the Prelude functions minimum and maximum do, but both at the same time. I've come up with the following code:

import Data.List

-- Declaration of rand

minMax :: [Int] -> Maybe (Int, Int)
minMax []   = Nothing
minMax (x:xs) = Just (foldl' f (x, x) xs)
                where
                  f (a, b) c = (if c < a then c else a, if c > b then c else b)

rand is a function that generates an infinite list of numbers. The thing is that when I append the following main function:

main = print $ minMax $ take 1000000 $ rand 7666532

compile and run all this with profiling, it shows me it uses over 200 MB of memory, so it's definitely not a constant-space function (which I'd like it to be).

The question is why and what should I change to fix it. As I understand foldl' folds the list from left (same way it's generated) and is not lazy, so I don't see why the memory usage is so high. I'm pretty sure it's the minMax function that is incorrect, as simply printing the said list, using

main = print $ take 1000000 $ rand 7666532

gives me 1MB usage, something that I understand and expect.

Upvotes: 33

Views: 1515

Answers (2)

obadz
obadz

Reputation: 909

The seq function which is used in foldl' essentially forces its first argument to be evaluated down to WHNF (Weak Head Normal Form).

As explained here, WHNF evaluation stops once you each the application of a constructor. (a, b) is therefore always in WHNF, even if a and b are thunks since you're hitting the Tuple constructor (,) before getting to a and b.

As a result, this space leaks despite the use of foldl':

foldl' (\ (a, b) x -> (a + x, b + x)) (0, 1) [1..1000000]

but this doesn't:

foldl' (\ (a, b) x -> a `seq` b `seq` (a + x, b + x)) (0, 1) [1..10000000]

It's also sometimes convenient to use the -XBangPatterns extension to write this:

foldl' (\ (!a, !b) x -> (a + x, b + x)) (0, 1) [1..10000000]

Upvotes: 12

Bakuriu
Bakuriu

Reputation: 101979

Note that foldl' forces the accumulator to weak head normal form. Since the accumulator is a tuple it does not force the evaluation of the two elements of the tuple.

If you explicitly force the two elements you get a constant-space function:

f (a, b) c = a `seq` b `seq` (if c < a then c else a, if c > b then c else b)

In your original program you are building a tuple of the kind: (<thunk>, <thunk>) and every time f is applied you simply build a tuple with bigger and bigger thunks. When finally this is consumed by print the call to show forces the full evaluation of the tuple and all the comparisons are made at that point.

Using seq you instead force f to evaluate the comparison at that moment, and thus the thunks contained in the accumulator are evaluated before performing the comparison. Hence the result is that the thunks stored in the accumulator have constant size.

What foldl' does is simply avoid building the thunk: f (f (f ...) y) x.

An alternative solution, as suggested by Jubobs, to avoid explicitly using seq is to use a data type that has strict fields:

data Pair a b = Pair !a !b
    deriving Show

And so the code would become:

minMax :: [Int] -> Maybe (Pair Int Int)
minMax []   = Nothing
minMax (x:xs) = Just (foldl' f (Pair x x) xs)
                where
                  f (Pair a b) c = Pair (if c < a then c else a) (if c > b then c else b)

This avoids thunks altogether.

Upvotes: 26

Related Questions