Why does a strict length function perform noticeably faster?

Question

I toyed around with definitions to better understand the evaluation model, and wrote two for the length of a list.

The naive definition:

len :: [a] -> Int
len [] = 0
len (_:xs) = 1 + len xs

The strict (and tail-recursive) definition:

slen :: [a] -> Int -> Int
slen [] n = n
slen (_:xs) !n = slen xs (n+1)

len [1..10000000] takes about 5-6 seconds to perform.
slen [1..10000000] 0 takes about 3-4 seconds to perform.

I'm curious why. Before I checked the performances I was positive that they would perform about the same because len should have only one more thunk to evaluate at most. For demonstration purposes:

len [a,b,c,d]
= 1 + len [b,c,d]
= 1 + 1 + len [c,d]
= 1 + 1 + 1 + len [d]
= 1 + 1 + 1 + 1 + len []
= 1 + 1 + 1 + 1 + 0
= 4

And

slen [a,b,c,d] 0
= slen [b,c,d] 1
= slen [c,d]   2
= slen [d]     3
= slen []      4
= 4

What makes slen noticeably faster?

P.S. I also wrote a tail-recursive lazy function (just like slen but lazy) as an attempt to close-in on the reason -- maybe it's because it was tail-recursive -- but it performed about the same as the naive definition.

David Young · Accepted Answer

The final step of len is not O(1). It is O(n) to add together n numbers. len also uses O(n) memory while slen uses O(1) memory.

The reason it uses O(n) memory is that each thunk uses up some memory. So when you have something like this:

1 + 1 + 1 + 1 + len []

there are five unevaluated thunks (including len [])

In GHCi, we can examine this thunk behavior a little easier with the :sprint command. The :sprint command prints the given value without forcing the evaluating of any thunks (you can learn more from :help). I'll use conses ((:)) since we can more easily evaluate each thunk one at a time, but the principle is the same.

λ> let ys = map id $ 1 : 2 : 3 : [] :: [Int] -- map id prevents GHCi from being too eager here
λ> :sprint ys
ys = _
λ> take 1 ys
[1]
λ> :sprint ys
ys = 1 : _
λ> take 2 ys
[1,2]
λ> :sprint ys
ys = 1 : 2 : _
λ> take 3 ys
[1,2,3]
λ> :sprint ys
ys = 1 : 2 : 3 : _
λ> take 4 ys
[1,2,3]
λ> :sprint ys
ys = [1,2,3]

Unevaluated thunks are represented by _ and you can see that in the original ys there are 4 thunks nested inside of each other, one for each part of the list (including []).

There isn't a good way that I know of to see this in Int because its evaluation is more all or nothing, but it still builds up a nested thunk in the same way. If you could see it like this, its evaluation would look something like this:

len [a,b,c,d]
= 1 + len [b,c,d]
= 1 + 1 + len [c,d]
= 1 + 1 + 1 + len [d]
= 1 + 1 + 1 + 1 + len []
= 1 + 1 + 1 + 1 + 0
= 1 + 1 + 1 + 1       -- Here it stops building the thunks and starts evaluating them
= 1 + 1 + 2
= 1 + 3
= 4

Why does a strict length function perform noticeably faster?

Answers (2)

Related Questions