Why one of these reverse lists function is faster than the other?

Question

On https://wiki.haskell.org/99_questions/Solutions/5 there is this solution to reverse a list:

reverse :: [a] -> [a]
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]

However this definition is more wasteful than the one in Prelude as it repeatedly reconses the result as it is accumulated. The following variation avoids that, and thus computationally closer to the Prelude version.

reverse :: [a] -> [a]
reverse list = reverse' list []
  where
    reverse' [] reversed     = reversed
    reverse' (x:xs) reversed = reverse' xs (x:reversed)

I'm trying to understand the difference between the two. What does reconses mean?

My idea is that reverse xs ++ [x] goes from the first element to the last and then adds the x, which takes n iterations (n= lenght of xs). In the second, it appends the rest of the list to x. But I don't know Haskell internals to know how it would be different from the other example.

What happens exactly?

Silvio Mayolo · Accepted Answer

Lists are defined as follows (pseudocode since there's some syntax sugar involved)

data [a] = [] | a : [a]

That is, a list is either empty or it consists of a first element and then the rest of the list.

Conceptually, the list [1, 2, 3] is stored in memory somewhat like the following.

If you're more comfortable in imperative languages, you can think of lists as consisting of a pointer to a "first" element and a pointer to the remainder of the list. If you've ever used a Lisp before, this should look very familiar.

Now, let's see what (++) does

(++) :: [a] -> [a] -> [a]
(++) []     ys = ys
(++) (x:xs) ys = x : (xs ++ ys)

Because it's easy to put things at the beginning of a list, (++) takes its left-hand-side list and puts each element onto the right-hand-side list. The right-hand-side list is left alone for the duration of this. That is, because of the way lists are stored, [1] ++ [2..10000] is fast, but [1..9999] ++ [10000] is slow. The time complexity (ignoring laziness) of (++) depends only on the first argument, not the second.

Now, your first reverse implementation is

reverse :: [a] -> [a]
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]

This repeatedly appends a long list (the "rest" of xs) onto a short list ([x]). Remember, the left-hand argument to (++) determines how fast it's going to run, and reverse xs is going to be long in general, so this is going to take awhile. It's also going to constantly rebuild the list several times unnecessarily, which suggests we can do better. For instance, reverse [1, 2, 3, 4] is going to do the following

reverse [1, 2, 3, 4]
reverse [2, 3, 4] ++ [1]
... lots of recursive work ...
[4, 3, 2] ++ [1]
... lots more recursive work ...
[4, 3, 2, 1]

While the first recursive step is necessary (we have to recurse to reverse the list, of course), the second one just rips apart all of that hard work we just did and builds it back up anyway. It would be really nice if we could avoid that.

reverse :: [a] -> [a]
reverse list = reverse' list []
  where
    reverse' [] reversed     = reversed
    reverse' (x:xs) reversed = reverse' xs (x:reversed)

Enter the accumulated reverse function. Here, we use an additional argument to "build up" the reversed list in the correct way. Rather than building the entire reversed list and then using (++) to add something to the end (remember, adding to the end of a big list is inefficient), we keep track of everything we want at the end of the list and keep putting things at the beginning repeatedly (putting things at the beginning of a linked list is incredibly efficient).

Compare the evaluation for the two reverse functions on even a small list [1, 2, 3] (again, I'm assuming we're immediately forcing the value here and hence ignoring laziness)

-- First function
reverse [1, 2, 3]
reverse [2, 3] ++ [1]
(reverse [3] ++ [2]) ++ [1]
((reverse [] ++ [3]) ++ [2]) ++ [1]
(([] ++ [3]) ++ [2]) ++ [1]
([3] ++ [2]) ++ [1]
(3 : ([] ++ [2])) ++ [1]
[3, 2] ++ [1]
3 : ([2] ++ [1])
3 : (2 : ([] ++ [1]))
[3, 2, 1]

-- Second function
reverse [1, 2, 3]
reverse' [1, 2, 3] []
reverse' [2, 3] [1]
reverse' [3] [2, 1]
reverse' [] [3, 2, 1]
[3, 2, 1]

Note how the second function is much cleaner and does a lot less shuffling around data and rebuilding structures unnecessarily.

Why one of these reverse lists function is faster than the other?

Answers (1)

Related Questions