Marcel Krakowiak
Marcel Krakowiak

Reputation: 87

Does calling map twice on the same list forces iterate through the list twice?

I am completely newbie in Haskell and reading tutorial I have found something like that

Thanks to Haskell's laziness, even if you map something over a list several times and filter it several times, it will only pass over the list once.

That's my simple stupid example: x = map foo (map foo [1,2]) where foo p = p * 2

Now, to my mind came two questions.

  1. Why is it thanks to laziness? I know that laziness means that it does not evaluate expression till it has to but... How does it force iterating list once instead of declared number of maps/filters?
  2. How can I verify that it iterates only once?
  3. Bonus question - I know that side effections are 🤮 but... It would be pretty nice if I can print something to debug e.g.
foo p = 
    let printResult = print "Some helpful message"
    in p * 2

Upvotes: 3

Views: 300

Answers (2)

Ben
Ben

Reputation: 71400

One point that isn't covered explicitly in the other answer, is that there is a sense in which laziness means "even if you map something over a list several times and filter it several times, it will only pass over the list once".

Lets look what happens when we evaluate map foo (map foo [1, 2, 3]) but rather than the consumer being head as in Willem Van Onsem's answer, let's say we're printing it so we need to evaluate the whole thing.

Although I don't want to trace the actual evaluation of print in much detail, so let's pretend we've got a special-purpose print for lists that looks something like this (and that putStr is a primitive):

print :: Show a => [a] -> IO ()
print [] = putStr "[]"
print (x : xs) = putStr "[" >> putStr (show x) >> print_rest xs
  where print_rest [] = putStr "]"
        print_rest (x : xs) = putStr ", " >> putStr (show x) >> print_rest xs

And of course we have:

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

And:

foo p = p * 2

So, we start with:

print (map foo (map foo [1, 2, 3]))

Which is of course really:1

print (map foo (map foo (1 : 2 : 3 : []) ))

print needs to pattern match its argument, which requires forcing the outer map. Outer map in turn starts by needing to know whether its argument is empty or not, and thus forces the inner map call. The inner map also needs to pattern match its argument, but that is a concrete list that is non-empty. So now we can return up the pattern match stack and get this:

print (map foo (map foo (1 : 2 : 3 : []) ))
print (map foo (foo 1 : map foo (2 : 3 : []) ))
print (foo (foo 1) : map foo (map foo (2 : 3 : []) ))
putStr "[" >> putStr (show (foo (foo 1))) >> print_rest (map foo (map foo (2 : 3 : []) ))

Now we've got something the IO driver can work with (a putStr call). So it can onsume that, resulting in the console output showing:

[

And we are left with the expression:

putStr (show (foo (foo 1))) >> print_rest (map foo (map foo (2 : 3 : []) ))

Which has another putStr at the head; this one just needs to force show (foo (foo 1)). show has to force foo (foo 1), which has to force foo 1 to 2; then the outer foo can produce 4 so show can produce "4".

putStr "4" >> print_rest (map foo (map foo (2 : 3 : []) ))

Letting IO consume that too we now have on the console output:

[4

And are left with this expression:

print_rest (map foo (map foo (2 : 3 : []) ))

By the same process as above we can go:

print_rest (map foo (map foo (2 : 3 : []) ))
print_rest (map foo (foo 2 : map foo (3 : []) ))
print_rest (foo (foo 2) : map foo (map foo (3 : []) ))
putStr "," >> putStr (show (foo (foo 2))) >> print_rest (map foo (map foo (3 : []) ))
putStr "," >> putStr (show (foo 4)) >> print_rest (map foo (map foo (3 : []) ))
putStr "," >> putStr (show 8) >> print_rest (map foo (map foo (3 : []) ))
putStr "," >> putStr "8" >> print_rest (map foo (map foo (3 : []) ))

Then the IO driver can consume the putStr calls so we an see:

[4, 8

And are left with the expression to evaluate:

print_rest (map foo (map foo (3 : []) ))

Which becomes (skipping over some steps now):

putStr "," >> putStr (show (foo (foo 3))) >> print_rest (map foo (map foo [] ))
putStr "," >> putStr "12" >> print_rest (map foo (map foo [] ))

IO does it magic thing so that we can see on the console:

[4, 8, 12

And finally the residual print_rest (map foo (map foo [] )) evaluates very straightforwardly to putStr "]" so we at last can see:

[4, 8, 12]

Now, let's reflect on what happened.

Lazy evaluation means we don't evaluate anything until it's needed, and the "need" ultimately comes from the IO driver needing to evaluate things until it has concrete primitives with fully evaluated arguments so that it can actually do something. That's why I chose the order that I did to evaluate things.

If you look over it you should notice that at no point did we ever produce an expression like map foo [2, 4, 6]. We evaluated both of the foo calls on the first element of the list before any of the mapping or printing even pattern matched to see if there was a second element of the list. This also means that the first element of the list (and both results of fooing it) became unreferenced and could be reclaimed by the garbage collector before the second element of the list was examined. And then the second element of the list was fully processed before the third was examined, too.

This is the sense in which laziness evaluates nested maps, filters, etc with only a single traversal of the base list. Sometimes this can result in large efficiency gains compared to an eager evaluation of the same nested maps, filters, etc. For example, if the base list was not a small concrete list like [1, 2, 3] but rather an expression that would lazily produce a very large (even infinite!) list, then our whole "multi-pass" sequence of maps could operate with only enough memory for one element at a time, rather than needing to produce the full base sequence, and then produce the full sequence at the next stage, and so on for each stage. It also means that we produce and consume all the intermediate stages of one value immediately, increasing the chance that we are working within the CPU cache or the shortest-lived most-efficient generation of the garbage collector's heap.

However, if you look very carefully, you'll note that even though we "only traversed the list once", we still had exactly the same : constructor applications that we would have had if we had fully evaluated map foo [1, 2, 3] to [2, 4, 6] and then traversed that. We still allocated a constructor cell for foo 1 : _, then immediately consumed it and allocated foo 2 : _.

So in another sense "laziness means we only traverse the list once" is not true. When we do end up using the whole list (as opposed to using head or take to only inspect some of it), then lazy evaluation results in exactly the same work (meaning allocations and pattern matches) as if we did evaluate each stage in its entirety and then traverse the result for the next stage. We've rearranged the work, and the order produced by lazy evaluation is frequently better, but we haven't fundamentally changed the amount of work we have to do. The intermediate lists are still there in some sense, even if they're never present in memory all at once.

Fortunately GHC has many other optimizations that come into play to avoid actually creating the intermediate list cells (or indeed the base list!), in many cases. For example, GHC can recognise that map foo (map bar xs) is equivalent to map (foo . bar) xs, which has no intermediate list to traverse or allocate whether we're evaluating lazily or eagerly. Particularly simple cases like the example I've used would be heavily inlined, and probably end up essentially compiled to just directly print [4, 8, 12] without allocating or evaluating anything.

TLDR: There is a sense in which lazy evaluation does avoid repeatedly traversing intermediate lists when you chain functions like maps and filters. So the quote in the OP may well be accurate (depending on how its author intended it), but there's important nuance and it's important not to oversell it.

When the entirety of the result is needed, all the work of those traversals is still done; lazy evaluation just rearranges the order the work is done in. Which could still be a big win if it allows you to have dramatically less memory consumed by intermediate results, since they now may not have to all be present in memory at once.


1 Remember that square-bracket syntax for lists is just a nice way for humans to write out a sequence that is really represented in memory as a nested series of applications of the : constructor, terminated by the [] constructor. [1, 2, 3] is 1 : (2 : (3 : [])), and we don't need those inner parentheses because the : constructor is infix and right associative, so we can just write 1 : 2 : 3 : [].

Upvotes: 2

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476534

Why is it thanks to laziness? I know that laziness means that it does not evaluate expression till it has to but... How does it force iterating list once instead of declared number of maps/filters?

If you are only interested in the first element of a list, map will not evaluate the rest of the list.

Imagine that you are interested in evaluating map foo (map foo [1,2]) to head normal form. Then it will thus call map foo (map foo [1,2]).

map :: (a -> b) -> [a] -> [b] is implemented as [src]:

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

It thus will evaluate the subexpression map foo [1,2] to head normal form: it has to know if the outer item is an empty list data constructor [], or a "cons" (_:_).

So it will evaluate the second map, but only up to the outer data constructor. This thus means that it will see that the list [1,2] is not empty, and thus it will produce foo 1 : map foo [2]. It will not evaluate foo 1, since that is not necessary either.

Since we now know that the subexpression has a "cons" as data constructor, the outer map function will return foo (foo 1) : map foo (map foo [2]). It thus moved the "cursor " of the original list one step further, but it will not enumerate over the entire list, nor will it evaluate the function foo on the elements.

If we want to print the head for examle, like print (head (map foo (map foo [1,2]))), it means that head will be evaluated, and head (foo (foo 1) : map foo (map foo [2])) will just return foo (foo 1). It will again not evaluate foo (foo 1). Now the print will need to evaluate the first item, since it is supposed to print that to the output channel, this means that it evaluates foo (foo 1). In order to evaluate foo (foo 1), it first has to evaluate foo 1, since this is necessary to force evaluation of the multiplication, so foo 1 is evaluated to 2, and foo 2 is evaluated to 4.

But we thus saved a lot of cycles by not evaluating the entire list, or the foo (foo …) parts of other elements.

How can I verify that it iterates only once?

There are tools like vizualize-cbn [GitHub] and ghc-viz [Hackage] that can help to visualize how laziness is handled by Haskell. One can see how evaluation of one element results in visualizing other elements that can help understanding recursive patterns.

Bonus question - I know that side effections are 🤮 but... It would be pretty nice if I can print something to debug e.g.

One can make use of trace :: String -> a -> a to print a message, for example:

import Debug.Trace(trace)

foo p = trace "evaluated foo" (p * 2)

If you then evaluate print (head (map foo (map foo [1,2]))), you see that it only evaluates foo twice, not four times. Indeed:

ghci> import Debug.Trace(trace)
ghci> foo p = trace "evaluated foo" (p * 2)
ghci> print (head (map foo (map foo [1,2])))
evaluated foo
evaluated foo
4

Upvotes: 2

Related Questions