Reputation: 21306
I toyed around with definitions to better understand the evaluation model, and wrote two for the length of a list.
The naive definition:
len :: [a] -> Int
len [] = 0
len (_:xs) = 1 + len xs
The strict (and tail-recursive) definition:
slen :: [a] -> Int -> Int
slen [] n = n
slen (_:xs) !n = slen xs (n+1)
len [1..10000000]
takes about 5-6 seconds to perform.
slen [1..10000000] 0
takes about 3-4 seconds to perform.
I'm curious why. Before I checked the performances I was positive that they would perform about the same because len
should have only one more thunk to evaluate at most. For demonstration purposes:
len [a,b,c,d]
= 1 + len [b,c,d]
= 1 + 1 + len [c,d]
= 1 + 1 + 1 + len [d]
= 1 + 1 + 1 + 1 + len []
= 1 + 1 + 1 + 1 + 0
= 4
And
slen [a,b,c,d] 0
= slen [b,c,d] 1
= slen [c,d] 2
= slen [d] 3
= slen [] 4
= 4
What makes slen
noticeably faster?
P.S. I also wrote a tail-recursive lazy function (just like slen
but lazy) as an attempt to close-in on the reason -- maybe it's because it was tail-recursive -- but it performed about the same as the naive definition.
Upvotes: 8
Views: 173
Reputation: 5992
David Young's answer gives the correct explanation of the difference in evaluation order. You should think about Haskell evaluation in the way he outlines.
Let me show you how you can see the difference in the Core. I think it's actually more visible with optimizations on, because the evaluation ends up as an explicit case
statement. If you've never played with Core before, see the canonical SO question on the topic: Reading GHC Core.
Generate the core output with ghc -O2 -ddump-simpl -dsuppress-all -ddump-to-file SO27392665.hs
. You'll see that GHC splits both len
and slen
into a recursive "worker" function, $wlen
or $wslen
, and a nonrecursive "wrapper" function. Because the vast majority of the time is spent in the recursive "workers," focus on them:
Rec {
$wlen
$wlen =
\ @ a_arZ w_sOR ->
case w_sOR of _ {
[] -> 0;
: ds_dNU xs_as0 ->
case $wlen xs_as0 of ww_sOU { __DEFAULT -> +# 1 ww_sOU }
}
end Rec }
len
len =
\ @ a_arZ w_sOR ->
case $wlen w_sOR of ww_sOU { __DEFAULT -> I# ww_sOU }
Rec {
$wslen
$wslen =
\ @ a_arR w_sOW ww_sP0 ->
case w_sOW of _ {
[] -> ww_sP0;
: ds_dNS xs_asW -> $wslen xs_asW (+# ww_sP0 1)
}
end Rec }
slen
slen =
\ @ a_arR w_sOW w1_sOX ->
case w1_sOX of _ { I# ww1_sP0 ->
case $wslen w_sOW ww1_sP0 of ww2_sP4 { __DEFAULT -> I# ww2_sP4 }
}
You can see that $wslen
has only one case
, while $wlen
has two. If you go look at David's answer, you can trace what happens in $wlen
: it does its case analysis on the outermost list constructor ([]
/:
), then makes the recursive call to $wlen xs_as0
(i.e. len xs
), which it also case
s, i.e. forces the accumulated thunk.
In $wslen
, on the other hand, there's only the one case
statement. In the recursive branch, there's simply an unboxed addition, (+# ww_sP0 1)
, which doesn't create a thunk.
(Note: a previous version of this answer had stated that with -O
GHC could specialize $wslen
but not $wlen
to use unboxed Int#
s. That's not the case.)
Upvotes: 4
Reputation: 10793
The final step of len
is not O(1). It is O(n) to add together n numbers. len
also uses O(n) memory while slen
uses O(1) memory.
The reason it uses O(n) memory is that each thunk uses up some memory. So when you have something like this:
1 + 1 + 1 + 1 + len []
there are five unevaluated thunks (including len []
)
In GHCi, we can examine this thunk behavior a little easier with the :sprint
command. The :sprint
command prints the given value without forcing the evaluating of any thunks (you can learn more from :help
). I'll use conses ((:)
) since we can more easily evaluate each thunk one at a time, but the principle is the same.
λ> let ys = map id $ 1 : 2 : 3 : [] :: [Int] -- map id prevents GHCi from being too eager here
λ> :sprint ys
ys = _
λ> take 1 ys
[1]
λ> :sprint ys
ys = 1 : _
λ> take 2 ys
[1,2]
λ> :sprint ys
ys = 1 : 2 : _
λ> take 3 ys
[1,2,3]
λ> :sprint ys
ys = 1 : 2 : 3 : _
λ> take 4 ys
[1,2,3]
λ> :sprint ys
ys = [1,2,3]
Unevaluated thunks are represented by _
and you can see that in the original ys
there are 4
thunks nested inside of each other, one for each part of the list (including []
).
There isn't a good way that I know of to see this in Int
because its evaluation is more all or nothing, but it still builds up a nested thunk in the same way. If you could see it like this, its evaluation would look something like this:
len [a,b,c,d]
= 1 + len [b,c,d]
= 1 + 1 + len [c,d]
= 1 + 1 + 1 + len [d]
= 1 + 1 + 1 + 1 + len []
= 1 + 1 + 1 + 1 + 0
= 1 + 1 + 1 + 1 -- Here it stops building the thunks and starts evaluating them
= 1 + 1 + 2
= 1 + 3
= 4
Upvotes: 9