Why is this tail recursion not faster than its non-tail recursive implementation?

Question

disclaimer: I'm quite new to elixir and the following might be quite trivial. Might simply be a language detail I'm not aware of.

TL;DR: The following 2 snippets of code simply implement a list mapping. The functions are called here accumulate, but they simply walk through the list and a apply a given function to each method. In my head, I think version 2 should be faster since it's tail recursive (or at least I thought it was. I might be wrong). It's not. It's significantly slower than version 1 and I can't understand why. I'd like some help with this.

The issue

I come from the java world where tail-call optimization is only spoken of but as you might know not present in the language. I guess between several other reasons nobody actually cared enough to implement such a feature in an imperative language where most of the recursive methods can be written in a iterative fashion. In any case, I've started learning elixir and one of the things I've picked up is a problem in Exercism called accumulate.

It's quite easy to understand. We're asked to implement a list mapping function - go through a given list and to each element apply a given function and at the end spit out the mapped list.

Here's my 1st go at the problem:

def accumulate([], _) do
  []
end

def accumulate([head | tail], fun) do
  [fun.(head) | accumulate(tail, fun)]
end

As I submitted the code I received a comment stating that this approach would not perform well for large inputs. In fact, the call stack would linearly grow with the input. The person who commented also challenged me to come up with a way where this wouldn't be a problem. Instantly my brain jumped into optimizing this with tail-recursion (not sure if this is the right way to go, but that's what I went with). So I cooked up this 2nd implementation:

defp accumulate([], _, acc) do
  acc
end

defp accumulate([head | tail], fun, acc) do
  accumulate(tail, fun, acc ++ [fun.(head)])
end

def accumulate(l, fun) do
  accumulate(l, fun, [])
end

Again, this whole situation made me question if I even understood what tail recursion is. Meaning, I might have done something completely different than what I think I did. In any case I decided to run a very simple benchmark. I searched online and used this function:

def measure(function) do
  function
  |> :timer.tc
  |> elem(0)
  |> Kernel./(1_000_000)
end

I've run it with a list of 100000 (hundred thousand) 5s. Yes, I literally started a python shell and issued [5] * 100000 and copy pasted the output into an elixir file. The mapping function was just squaring the numbers. The whole thing ended up like this:

def test() do
  IO.inspect Benchmark.measure(fn ->
    accumulate(Data.data, fn x -> x*x end)
  end)
end

(Data.data is the list with the mentioned 100000 5s)

On my machine this yields around 0.006 seconds for the 1st implementation and around 28 seconds for the 2nd. Like I said this is not what I was expecting. I was expecting the exact opposite. So here are my questions:

1. Am I grasping this tail-recursion optimization thing correctly?

2. What's the crucial difference between both approaches that makes this whole thing differ so much in time?

PS: I've took a look at the Erlang lists module and saw that the method map is actually implemented the same way as in my 1st approach (at least looks like it), so I'm guessing there's a reason for this?

Dogbert · Accepted Answer

The second version is tail recursive, but ++ needs to make a copy of the whole LHS. Since the LHS is the accumulator here, your function becomes O(n^2) instead of O(n). The solution is to accumulate the list in reverse order and then call :lists.reverse/1 at the end. This will be O(n) because prepending an element to a list is O(1) and reversing a list is O(n). This idiom is quite common in Elixir and Erlang code.

defp accumulate([], _, acc) do
  :lists.reverse(acc)
end

defp accumulate([head | tail], fun, acc) do
  accumulate(tail, fun, [fun.(head) | acc])
end

def accumulate(l, fun) do
  accumulate(l, fun, [])
end

This might still not be faster than a naive non tail recursive function because Erlang optimizes these cases to be as fast as a tail recursive version as explained in this myth.

Why is this tail recursion not faster than its non-tail recursive implementation?

Answers (2)

Related Questions