Reputation: 1020
I'm learning Haskell currently (being a programmer by trade, but this is my first attempt at a functional language).
I want to write a function that scans a list and returns both the minimum and maximum element of that list. Sort of what the Prelude functions minimum
and maximum
do, but both at the same time. I've come up with the following code:
import Data.List
-- Declaration of rand
minMax :: [Int] -> Maybe (Int, Int)
minMax [] = Nothing
minMax (x:xs) = Just (foldl' f (x, x) xs)
where
f (a, b) c = (if c < a then c else a, if c > b then c else b)
rand
is a function that generates an infinite list of numbers.
The thing is that when I append the following main
function:
main = print $ minMax $ take 1000000 $ rand 7666532
compile and run all this with profiling, it shows me it uses over 200 MB of memory, so it's definitely not a constant-space function (which I'd like it to be).
The question is why and what should I change to fix it. As I understand foldl'
folds the list from left (same way it's generated) and is not lazy, so I don't see why the memory usage is so high. I'm pretty sure it's the minMax
function that is incorrect, as simply printing the said list, using
main = print $ take 1000000 $ rand 7666532
gives me 1MB usage, something that I understand and expect.
Upvotes: 33
Views: 1515
Reputation: 909
The seq
function which is used in foldl'
essentially forces its first argument to be evaluated down to WHNF (Weak Head Normal Form).
As explained here, WHNF evaluation stops once you each the application of a constructor. (a, b)
is therefore always in WHNF, even if a
and b
are thunks since you're hitting the Tuple constructor (,)
before getting to a
and b
.
As a result, this space leaks despite the use of foldl'
:
foldl' (\ (a, b) x -> (a + x, b + x)) (0, 1) [1..1000000]
but this doesn't:
foldl' (\ (a, b) x -> a `seq` b `seq` (a + x, b + x)) (0, 1) [1..10000000]
It's also sometimes convenient to use the -XBangPatterns
extension to write this:
foldl' (\ (!a, !b) x -> (a + x, b + x)) (0, 1) [1..10000000]
Upvotes: 12
Reputation: 101979
Note that foldl'
forces the accumulator to weak head normal form. Since the accumulator is a tuple it does not force the evaluation of the two elements of the tuple.
If you explicitly force the two elements you get a constant-space function:
f (a, b) c = a `seq` b `seq` (if c < a then c else a, if c > b then c else b)
In your original program you are building a tuple of the kind: (<thunk>, <thunk>)
and every time f
is applied you simply build a tuple with bigger and bigger thunks. When finally this is consumed by print
the call to show
forces the full evaluation of the tuple and all the comparisons are made at that point.
Using seq
you instead force f
to evaluate the comparison at that moment, and thus the thunks contained in the accumulator are evaluated before performing the comparison. Hence the result is that the thunks stored in the accumulator have constant size.
What foldl'
does is simply avoid building the thunk: f (f (f ...) y) x
.
An alternative solution, as suggested by Jubobs, to avoid explicitly using seq
is to use a data type that has strict fields:
data Pair a b = Pair !a !b
deriving Show
And so the code would become:
minMax :: [Int] -> Maybe (Pair Int Int)
minMax [] = Nothing
minMax (x:xs) = Just (foldl' f (Pair x x) xs)
where
f (Pair a b) c = Pair (if c < a then c else a) (if c > b then c else b)
This avoids thunks altogether.
Upvotes: 26