Reputation: 129
Assume I have two maps (m1, m2) that are expected to have mostly identical kv pairs, but each may have entries the other doesn't. Ultimately, I want a map that contains all kv pairs from both maps, so at a high level I want to merge them.
However, given the implementations of Map.merge (Erlang BIF) and Map.split (tail recursion), as well as the heuristic of expecting the differences to be few proportional to the map size, which of the below options is better suited to achieve the desired outcome?
First split to find the kv pairs unique to m2, and merge only those
...
{_duplicateKeys, m2only} = Map.split(m2, Map.keys(m1))
Map.merge(m1, m2only)
...
Or just merge, hoping the implementation will optimize constructing the map
...
Map.merge(m1, m2)
...
Upvotes: 2
Views: 1028
Reputation: 4507
I would choose the Map.merge
approach. Premature optimization is typically an anti-pattern. If you later find performance issues, you can optimize then. Erlang BIFs are typically pretty performant.
EDIT:
Here is a quick benchmark
spallen@Steves-MacBook-Pro ~/myprojects/elixir/maps time ./map.exs
MapDemo running count: 5000000
./map.exs 21.30s user 2.73s system 98% cpu 24.371 total
spallen@Steves-MacBook-Pro ~/myprojects/elixir/maps time ./split.exs
SplitDemo running count: 5000000
./split.exs 25.68s user 4.28s system 98% cpu 30.479 total
Here is the code
#! /usr/local/bin//elixir
defmodule MapDemo do
@upper 5000000
def run do
IO.puts "MapDemo running count: #{@upper}"
map1 =
0..@upper
|> Enum.map(& {"key_#{&1}", &1})
|> Enum.into(%{})
map2 =
100..(@upper + 100)
|> Enum.map(& {"key_#{&1}", &1})
|> Enum.into(%{})
Map.merge(map1, map2)
end
end
MapDemo.run
and
#! /usr/local/bin//elixir
defmodule SplitDemo do
@upper 5000000
def run do
IO.puts "SplitDemo running count: #{@upper}"
map1 =
0..@upper
|> Enum.map(& {"key_#{&1}", &1})
|> Enum.into(%{})
map2 =
100..(@upper + 100)
|> Enum.map(& {"key_#{&1}", &1})
|> Enum.into(%{})
{_duplicateKeys, m2only} = Map.split(map2, Map.keys(map1))
Map.merge(map1, m2only)
end
end
SplitDemo.run
Upvotes: 3
Reputation: 4885
I'd expect the cost of split
to outweigh any savings in merge
.
When the maps are large, it looks like the BIF uses hashmap_merge, which has this comment in the source:
/*
* Strategy: Do depth-first traversal of both trees (at the same time)
* and merge each pair of nodes.
*/
The implementation appears to detect when there are identical sub-trees in the maps:
switch (sp->mix) {
case 0: /* Nodes A and B contain the *EXACT* same sub-trees
=> fall through and reuse nodeA */
case 1: /* Only unique A stuff => reuse nodeA */
res = sp->nodeA;
break;
case 2: /* Only unique B stuff => reuse nodeB */
res = sp->nodeB;
break;
case 3: /* We have a mix => must build new node */
Upvotes: 3
Reputation: 129
After looking at the documentation for the Map.merge BIF and discussing in #elixir-lang with asonge, it was decided that simply merging was the appropriate solution.
...
Map.merge(m1, m2)
...
There are some nuances to the solution: scale of the map is important in deciding how the map is represented in memory--flatmap for small, hashmap for large.
However, simply merging was chosen because a hashmap is used for large maps where the optimization would matter. Since many nodes will compare equal in mostly similar maps, the algorithm should only need to reconstruct a small portion of the tree underlying the map. At least that's what our research indicated; this hasn't been tested in practice
Upvotes: 2