Reputation: 129

Erlang/Elixir Merging maps with mostly duplicate keys

Assume I have two maps (m1, m2) that are expected to have mostly identical kv pairs, but each may have entries the other doesn't. Ultimately, I want a map that contains all kv pairs from both maps, so at a high level I want to merge them.

However, given the implementations of Map.merge (Erlang BIF) and Map.split (tail recursion), as well as the heuristic of expecting the differences to be few proportional to the map size, which of the below options is better suited to achieve the desired outcome?

First split to find the kv pairs unique to m2, and merge only those

...
{_duplicateKeys, m2only} = Map.split(m2, Map.keys(m1))
Map.merge(m1, m2only)
...

Or just merge, hoping the implementation will optimize constructing the map
```
...
Map.merge(m1, m2)
...
```

Upvotes: 2

Answers (3)

Steve Pallen

Reputation: 4507

I would choose the Map.merge approach. Premature optimization is typically an anti-pattern. If you later find performance issues, you can optimize then. Erlang BIFs are typically pretty performant.

EDIT:

Here is a quick benchmark

spallen@Steves-MacBook-Pro ~/myprojects/elixir/maps  time ./map.exs                         
MapDemo running count: 5000000
./map.exs  21.30s user 2.73s system 98% cpu 24.371 total
spallen@Steves-MacBook-Pro  ~/myprojects/elixir/maps  time ./split.exs                       
SplitDemo running count: 5000000
./split.exs  25.68s user 4.28s system 98% cpu 30.479 total

Here is the code

#! /usr/local/bin//elixir
defmodule MapDemo do
  @upper 5000000
  def run do
    IO.puts "MapDemo running count: #{@upper}"
    map1 =
      0..@upper
      |> Enum.map(& {"key_#{&1}", &1})
      |> Enum.into(%{})

    map2 =
      100..(@upper + 100)
      |> Enum.map(& {"key_#{&1}", &1})
      |> Enum.into(%{})

    Map.merge(map1, map2)
  end
end

MapDemo.run

and

#! /usr/local/bin//elixir
defmodule SplitDemo do
  @upper 5000000
  def run do
    IO.puts "SplitDemo running count: #{@upper}"
    map1 =
      0..@upper
      |> Enum.map(& {"key_#{&1}", &1})
      |> Enum.into(%{})

    map2 =
      100..(@upper + 100)
      |> Enum.map(& {"key_#{&1}", &1})
      |> Enum.into(%{})

    {_duplicateKeys, m2only} = Map.split(map2, Map.keys(map1))
    Map.merge(map1, m2only)
  end
end

SplitDemo.run

Upvotes: 3

Mike Buhot

Reputation: 4885

I'd expect the cost of split to outweigh any savings in merge.

When the maps are large, it looks like the BIF uses hashmap_merge, which has this comment in the source:

/*
 * Strategy: Do depth-first traversal of both trees (at the same time)
 * and merge each pair of nodes.
 */

The implementation appears to detect when there are identical sub-trees in the maps:

switch (sp->mix) {
    case 0: /* Nodes A and B contain the *EXACT* same sub-trees
               => fall through and reuse nodeA */

    case 1: /* Only unique A stuff => reuse nodeA */
        res = sp->nodeA;
        break;

    case 2: /* Only unique B stuff => reuse nodeB */
        res = sp->nodeB;
        break;

    case 3: /* We have a mix => must build new node */

Upvotes: 3

usernolan

Reputation: 129

After looking at the documentation for the Map.merge BIF and discussing in #elixir-lang with asonge, it was decided that simply merging was the appropriate solution.

...
Map.merge(m1, m2)
...

There are some nuances to the solution: scale of the map is important in deciding how the map is represented in memory--flatmap for small, hashmap for large.

However, simply merging was chosen because a hashmap is used for large maps where the optimization would matter. Since many nodes will compare equal in mostly similar maps, the algorithm should only need to reconstruct a small portion of the tree underlying the map. At least that's what our research indicated; this hasn't been tested in practice

Upvotes: 2

Erlang/Elixir Merging maps with mostly duplicate keys

Answers (3)

Related Questions