Why Enum.concat is much more slower than ++ while concatenating lists?

Question

I tried to do some quick benchmarking using Benchfella:

defmodule ConcatListBench do
  use Benchfella

  @a1 Enum.to_list(1..10_000)
  @a2 Enum.to_list(10_000..20_0000)

  bench "++" do
    @a1 ++ @a2
  end

  bench "Enum.concat" do
    Enum.concat(@a1, @a2)
  end
end

And while running it:

$ elixir -v
Erlang/OTP 19 [erts-8.0.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.4.0-dev (762e7de)

$ mix bench
Settings:
  duration:      1.0 s

## ConcatListBench
[10:01:09] 1/2: ++
[10:01:20] 2/2: Enum.concat

Finished in 14.03 seconds

## ConcatListBench
benchmark na iterations   average time
++           1000000000   0.01 µs/op
Enum.concat       50000   45.03 µs/op

The question is how Enum.concat could be slower (over 4k times) if it uses ++ operator internally for lists ?

I understand that guard clauses in Enum.concat and pattern matching costs some time, but benchmark shows a big difference, isn't it ?

UPDATE: This happens due to Constant Folding, concatenation using ++ optimized at compile time and takes instant time to be run. So the benchmark is not quite realistic.

Dogbert · Accepted Answer

Short answer: Constant Folding.

Longer answer: Module attributes in Elixir are replaced with their literal values when Elixir is compiled to beam files. For example, the following code:

defmodule ConcatListBench do
  @a1 Enum.to_list(1..10)
  @a2 Enum.to_list(10..20)

  def plusplus, do: @a1 ++ @a2

  def concat, do: Enum.concat(@a1, @a2)
end

compiles to:

-module('Elixir.ConcatListBench').
... 
concat() ->
    'Elixir.Enum':concat([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
             [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]).

plusplus() ->
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ++
      [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

The Erlang compiler's sys_core_fold module, which does constant folding optimization, evaluates ++ operations as much as possible at compile time. Since in this case, both the lists are literals, it can completely eliminate the function call and replace it with the resulting list. So in your benchmark, the ++ function is just returning a list that already exists in the VM. It's as fast as doing 1 + 2 (which is also constant folded to 3):

...
bench "1 + 2" do
  1 + 2
end
...

## ConcatListBench
benchmark na iterations   average time
1 + 2        1000000000   0.01 µs/op
++           1000000000   0.01 µs/op
Enum.concat       50000   37.89 µs/op

A more realistic benchmark would be to do an indirect call to ++ which the Erlang compiler does not fold:

def plus_plus(a, b), do: a ++ b

bench "++" do
  plus_plus(@a1, @a2)
end

These are the outputs of 3 runs:

## ConcatListBench
benchmark na iterations   average time
Enum.concat       50000   37.44 µs/op
++                50000   41.65 µs/op

## ConcatListBench
benchmark na iterations   average time
++                50000   36.07 µs/op
Enum.concat       50000   38.58 µs/op

## ConcatListBench
benchmark na iterations   average time
Enum.concat       50000   39.34 µs/op
++                50000   40.74 µs/op

So really, if your lists aren't constant at compile time, both ways are just as fast. I would expect Enum.concat to be a tiny bit slower (especially for small lists) as it does a bit more work than ++.

Why Enum.concat is much more slower than ++ while concatenating lists?

Answers (1)

Related Questions