Reputation: 2174
On https://wiki.haskell.org/99_questions/Solutions/5 there is this solution to reverse a list:
reverse :: [a] -> [a]
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]
However this definition is more wasteful than the one in Prelude as it repeatedly reconses the result as it is accumulated. The following variation avoids that, and thus computationally closer to the Prelude version.
reverse :: [a] -> [a]
reverse list = reverse' list []
where
reverse' [] reversed = reversed
reverse' (x:xs) reversed = reverse' xs (x:reversed)
I'm trying to understand the difference between the two. What does reconses
mean?
My idea is that reverse xs ++ [x]
goes from the first element to the last and then adds the x
, which takes n
iterations (n= lenght of xs
). In the second, it appends the rest of the list to x
. But I don't know Haskell internals to know how it would be different from the other example.
What happens exactly?
Upvotes: 3
Views: 131
Reputation: 70277
Lists are defined as follows (pseudocode since there's some syntax sugar involved)
data [a] = [] | a : [a]
That is, a list is either empty or it consists of a first element and then the rest of the list.
Conceptually, the list [1, 2, 3]
is stored in memory somewhat like the following.
If you're more comfortable in imperative languages, you can think of lists as consisting of a pointer to a "first" element and a pointer to the remainder of the list. If you've ever used a Lisp before, this should look very familiar.
Now, let's see what (++)
does
(++) :: [a] -> [a] -> [a]
(++) [] ys = ys
(++) (x:xs) ys = x : (xs ++ ys)
Because it's easy to put things at the beginning of a list, (++)
takes its left-hand-side list and puts each element onto the right-hand-side list. The right-hand-side list is left alone for the duration of this. That is, because of the way lists are stored, [1] ++ [2..10000]
is fast, but [1..9999] ++ [10000]
is slow. The time complexity (ignoring laziness) of (++)
depends only on the first argument, not the second.
Now, your first reverse
implementation is
reverse :: [a] -> [a]
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]
This repeatedly appends a long list (the "rest" of xs
) onto a short list ([x]
). Remember, the left-hand argument to (++)
determines how fast it's going to run, and reverse xs
is going to be long in general, so this is going to take awhile. It's also going to constantly rebuild the list several times unnecessarily, which suggests we can do better. For instance, reverse [1, 2, 3, 4]
is going to do the following
reverse [1, 2, 3, 4]
reverse [2, 3, 4] ++ [1]
... lots of recursive work ...
[4, 3, 2] ++ [1]
... lots more recursive work ...
[4, 3, 2, 1]
While the first recursive step is necessary (we have to recurse to reverse the list, of course), the second one just rips apart all of that hard work we just did and builds it back up anyway. It would be really nice if we could avoid that.
reverse :: [a] -> [a]
reverse list = reverse' list []
where
reverse' [] reversed = reversed
reverse' (x:xs) reversed = reverse' xs (x:reversed)
Enter the accumulated reverse function. Here, we use an additional argument to "build up" the reversed list in the correct way. Rather than building the entire reversed list and then using (++)
to add something to the end (remember, adding to the end of a big list is inefficient), we keep track of everything we want at the end of the list and keep putting things at the beginning repeatedly (putting things at the beginning of a linked list is incredibly efficient).
Compare the evaluation for the two reverse functions on even a small list [1, 2, 3]
(again, I'm assuming we're immediately forcing the value here and hence ignoring laziness)
-- First function
reverse [1, 2, 3]
reverse [2, 3] ++ [1]
(reverse [3] ++ [2]) ++ [1]
((reverse [] ++ [3]) ++ [2]) ++ [1]
(([] ++ [3]) ++ [2]) ++ [1]
([3] ++ [2]) ++ [1]
(3 : ([] ++ [2])) ++ [1]
[3, 2] ++ [1]
3 : ([2] ++ [1])
3 : (2 : ([] ++ [1]))
[3, 2, 1]
-- Second function
reverse [1, 2, 3]
reverse' [1, 2, 3] []
reverse' [2, 3] [1]
reverse' [3] [2, 1]
reverse' [] [3, 2, 1]
[3, 2, 1]
Note how the second function is much cleaner and does a lot less shuffling around data and rebuilding structures unnecessarily.
Upvotes: 6