Richard
Richard

Reputation: 16772

Why doesn't my Haskell function support infinite lists?

I don't understand why my implementation of this function for a tutorial exercise fails to be lazy. It fails when running take 4 (chunks 3 [0..]). Please advise me what to change.

I'm not doing anything that I think would cause an infinite list to be evaluated:

--   chunks 2 [1,2,3,4] ==> [[1,2],[2,3],[3,4]]
--   take 4 (chunks 3 [0..]) ==> [[0,1,2],[1,2,3],[2,3,4],[3,4,5]]

chunks :: Int -> [a] -> [[a]]
chunks len input = chunks' len input []

chunks' :: Int -> [a] -> [[a]] -> [[a]]
chunks' _ [] output = output
chunks' len input@(input1:inputTail) output
    | lengthAtLeast len input = chunks' len inputTail (output++[take len input])
    | otherwise               = output
    where 
        lengthAtLeast len list = length (take len list) >= len

Upvotes: 6

Views: 181

Answers (2)

Ben
Ben

Reputation: 71495

In addition to chi's excellent answer about your specific code, I think it's worth elaborating on the point that tail recursion was the wrong approach; you might need to rethink how you approach these functions in general.

It appears you wanted to write your algorithm using tail recursion; I presume this is because of the common advice that tail recursion is more efficient and/or avoids stack overflow errors. However that advice comes from strict languages (that do tail call elimination; it doesn't help if they don't have this optimisation). In a lazy language like Haskell tail recursion sometimes improves performance, and sometimes makes it much worse! So you don't want to apply the accumulator introduction trick blindly as a matter of course. ("Accumulator introduction" is a fancy name for the technique you used in adding the extra output parameter to chunks'. It is so called because you "accumulate" the final result in this parameter as you get deeper in the call stack, only returning it in the base case.)

In particular tail recursive code like this almost by definition cannot work for a function that is supposed to "stream" its (potentially infinite) result gradually from its (potentially infinite) input. By "stream" I mean that the function will only need to examine a part of its input in order to produce a usable part of its output, so it's our caller that decides how much of the input needs to be examined (by deciding how much of our output it wants to examine).

Think about it; tail recursion with an accumulator parameter means the recursive calls only ever return another call to this function, or the final result in one go. The caller doesn't get anything until they have forced all of those layers of calls all the way down to the base case; this is the opposite of lazily streaming a result, and cannot possibly work on an infinite input where it will never actually hit the base case. It also means the entirety of the input had to be examined (and thus forced all at once if the input itself was the result of a lazy streaming computation), and the output also has to be produced in its entirety and exist in memory all at once (or sometimes worse, a large chain of thunks).

Whereas the style chi suggested is exactly what you would avoid in a strict language. There is no accumulator, and the recursive call to chunks' is not in tail position. chi's chunks' returns a structured value with a recursive call to chunks' contained inside one of its fields (specifically a list cell produced with the : constructor, with the recursive call in the second field). In a strict language the recursive call would have to be evaluated (all the way to its base case) before the list cell containing it could be built, which means allocating a stack frame for every level of the recursion and could cause a stack overflow. Hence the advice to restructure your code to use tail recursion, which (if the language has tail call elimination) reuses the same stack frame for every level of the recursion.

In Haskell, however, laziness means the recursive call to chunks' doesn't need to be evaluated immediately. It can be stored in the second field of the list cell as an unevaluated thunk, which means the caller can immediately inspect the first field of the list cell without any more work being spent on chunks'. That property is what makes "streaming" work, and allows the processing of infinite data structures. There is no need for tail recursion to avoid a stack overflow; as you've seen tail recursion makes it worse.

This code structure, where you put a recursive call inside a field of a data structure you return, is known as guarded recursion. In Haskell guarded recursion is usually more important than tail recursion; certainly more important when you want to handle infinite (or even just "large") data structures.

Upvotes: 4

chi
chi

Reputation: 116139

This is an issue:

chunks' len input@(input1:inputTail) output
    | lengthAtLeast len input = chunks' len inputTail (output++[take len input])

Note that the condition lengthAtLeast ... will be always true for infinite lists. Hence, we always fall in this case -- which recurses without producing any part of the output. The output would be returned only when we fall in the other cases, but that never happens.

Instead, make sure the result is produced before we recurse:

chunks' len input@(input1:inputTail) output
    | lengthAtLeast len input = take len input : chunks' len inputTail output

Of course, after this change output will always be the empty list, and we can remove the argument, further simplifying the code.

The thumb rule is: when dealing with infinite lists, tail recursion will lead to infinite loops. You want to generate at least part of the result before recursing. In general, tail recursion is the wrong approach when your output is a large data structure like a list, which should instead be generated gradually so that we can benefit form laziness.

Upvotes: 8

Related Questions