Tango
Tango

Reputation: 396

c# merging multiple dictionaries into one

Firstly, i'd just like to mention that I've only started learning C# a few days ago so my knowledge of it is limited.

I am merging multiple dictionaries having same type of key value pair into a single one.

The following is my approach which works and also handles the duplicates

 var result = dict1.Concat(dict2).GroupBy(d => d.Key)
              .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict3).GroupBy(d => d.Key)
                .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict4).GroupBy(d => d.Key)
                .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict5).GroupBy(d => d.Key)
    .ToDictionary(d => d.Key, d => d.First().Value); 

I would like to know if there is an efficient way of merging multiple dictionaries having key value pair of same data type.

Upvotes: 2

Views: 12461

Answers (2)

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112382

Since dictionaries implement IEnumerable<KeyValuePair<TKey, TValue>>, you can simply write:

var result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .ToDictionary(e => e.Key, e => e.Value);

This assumes that there are no duplicate keys.

If there are duplicate keys, you could get the first value for each key

result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .GroupBy(e => e.Key)
    .ToDictionary(g => g.Key, g => g.First().Value);

Other variants are conceivable, like keeping the maximum/minimum value etc.

If there are duplicate keys with different values, you could also create a dictionary of value lists

Dictionary<TKey, List<TValue>> result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .GroupBy(e => e.Key, e => e.Value)
    .ToDictionary(g => g.Key, v => v.ToList());

Instead of creating a List<T> of values, you could insert them into a HashSet<T> to only keep unique values.

If the values are always the same for duplicate keys then simply use Union instead of Concat:

var result = dict1
    .Union(dict2)
    .Union(dict3)
    .Union(dict4)
    .Union(dict5)
    .ToDictionary(e => e.Key, e => e.Value);

Union produces the set union of two sequences. Concat concatenates two sequences.

Finally, you can combine the two preceding approaches and discard equal key/value pairs, but keep a list of different values per key:

Dictionary<TKey, List<TValue>> result = dict1
    .Union(dict2)
    .Union(dict3)
    .Union(dict4)
    .Union(dict5)
    .GroupBy(e => e.Key, e => e.Value)
    .ToDictionary(g => g.Key, v => v.ToList());

These examples show that it is important to know exactly how the input data is shaped (unique/non-unique keys and key-value-pairs) and precisely what kind of result you expect.


A different approach would be to let your different methods return lists or enumerations instead of dictionaries and merge these collections into a dictionary at the end. This would be more performing.

Upvotes: 21

Although it doesn't use any pretty Linq, I think the following will be more efficient. It creates only one additional dictionary, which is the result. It is sized initially so that there will be no grows. In addition, the number of inserts will be exactly the number of elements in the result Dictionary.

I think this will be more efficient than creating several intermediary dictionaries or other collections, or doing things in a way that results in the new dictionary or intermediary dictionaries having to go through multiple growth resizes. In the middle foreach, I don't know if it's more efficient to check against dict1 or result for the ContainsKey. I checked against dict1 because there is no need to check result which will have more and more values from dict2, and we know that no key in dict2 is in dict2 more than once.

var result = new Dictionary<MyKeyType, MyValueType>(dict1.Count + dict2.Count + dict3.Count
    + dict4.Count + dict5.Count);
foreach(var pair in dict1) {
    result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict2) {
    if (!dict1.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict3) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict4) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict5) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}

In a timing test with 5 large dictionaries having mostly unique keys between them, it worked out like this (times in milliseconds):

  • 1037 ms for your code
  • 357 ms for the middle block of code in the other answer using Linq
  • 784 ms for the third block of code in the other answer using Linq
  • 43 ms for the code above using foreach

In the case of a key being in multiple dictionaries, the first value is the one that's used, because you haven't specified any specific way you want to handle that situation.

Upvotes: 2

Related Questions