Why Enum.map is more efficient than Enum.count in Elixir?

Question

I've found that counting with Enum.map |> Enum.sum works much faster than Enum.count. But why is not the built-in count function efficient?

Compare these two lines:

elements |> Enum.count(&check/1)
elements |> Enum.map(&(if check(&1), do: 1, else: 0)) |> Enum.sum

The results of tests with different length of list:

len   | map(), µs | count(), µs
===============================
   10 |      0.67 |    2.55
  100 |      5.52 |    8.91
 1000 |     59.00 |   73.12
10000 |    642.50 |  734.60

Source code: https://gist.github.com/artemrizhov/fc146f7ab390f7a807be833099e5cb83

$ elixir --version
Erlang/OTP 19 [erts-8.1] [source-e7be63d] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Elixir 1.3.4

michalmuskala · Accepted Answer

This is because both Enum.map/2 and Enum.reduce/3 (that is used by sum) are heavily optimised for lists, while Enum.count/2 has only a generic enumerable version.

To add to confusion, there's even a faster implementation:

elements |> Enum.filter(&check/1) |> Enum.count

On my machine, using modified benchmark you provided I get a consistent result of:

len = 10
map:    2.1706 μs
count:  7.0754 μs
filter: 1.9765 μs

len = 100
map:    19.921 μs
count:  27.579 μs
filter: 14.591 μs

len = 1000
map:    168.93 μs
count:  178.93 μs
filter: 163.43 μs

len = 10000
map:    2128.9 μs
count:  1822.1 μs
filter: 1664.6 μs

That said both the "map" and the "filter" version produce more garbage when they operate, so some of the time gains might be consumed by the extended garbage collection time. This is already visible in the benchmarks, as the relative gains between the versions decrease as the length of the length of the list (and the amount of intermediate garbage produced) increase.

EDIT: I submitted a PR that optimises the Enum.count/2 function to be the fastest one https://github.com/elixir-lang/elixir/pull/5567

Why Enum.map is more efficient than Enum.count in Elixir?

Answers (1)

Related Questions