Reputation: 396
Firstly, i'd just like to mention that I've only started learning C# a few days ago so my knowledge of it is limited.
I am merging multiple dictionaries having same type of key value pair into a single one.
The following is my approach which works and also handles the duplicates
var result = dict1.Concat(dict2).GroupBy(d => d.Key)
.ToDictionary(d => d.Key, d => d.First().Value);
result = result.Concat(dict3).GroupBy(d => d.Key)
.ToDictionary(d => d.Key, d => d.First().Value);
result = result.Concat(dict4).GroupBy(d => d.Key)
.ToDictionary(d => d.Key, d => d.First().Value);
result = result.Concat(dict5).GroupBy(d => d.Key)
.ToDictionary(d => d.Key, d => d.First().Value);
I would like to know if there is an efficient way of merging multiple dictionaries having key value pair of same data type.
Upvotes: 2
Views: 12461
Reputation: 112382
Since dictionaries implement IEnumerable<KeyValuePair<TKey, TValue>>
, you can simply write:
var result = dict1
.Concat(dict2)
.Concat(dict3)
.Concat(dict4)
.Concat(dict5)
.ToDictionary(e => e.Key, e => e.Value);
This assumes that there are no duplicate keys.
If there are duplicate keys, you could get the first value for each key
result = dict1
.Concat(dict2)
.Concat(dict3)
.Concat(dict4)
.Concat(dict5)
.GroupBy(e => e.Key)
.ToDictionary(g => g.Key, g => g.First().Value);
Other variants are conceivable, like keeping the maximum/minimum value etc.
If there are duplicate keys with different values, you could also create a dictionary of value lists
Dictionary<TKey, List<TValue>> result = dict1
.Concat(dict2)
.Concat(dict3)
.Concat(dict4)
.Concat(dict5)
.GroupBy(e => e.Key, e => e.Value)
.ToDictionary(g => g.Key, v => v.ToList());
Instead of creating a List<T>
of values, you could insert them into a HashSet<T>
to only keep unique values.
If the values are always the same for duplicate keys then simply use Union
instead of Concat
:
var result = dict1
.Union(dict2)
.Union(dict3)
.Union(dict4)
.Union(dict5)
.ToDictionary(e => e.Key, e => e.Value);
Union
produces the set union of two sequences. Concat
concatenates two sequences.
Finally, you can combine the two preceding approaches and discard equal key/value pairs, but keep a list of different values per key:
Dictionary<TKey, List<TValue>> result = dict1
.Union(dict2)
.Union(dict3)
.Union(dict4)
.Union(dict5)
.GroupBy(e => e.Key, e => e.Value)
.ToDictionary(g => g.Key, v => v.ToList());
These examples show that it is important to know exactly how the input data is shaped (unique/non-unique keys and key-value-pairs) and precisely what kind of result you expect.
A different approach would be to let your different methods return lists or enumerations instead of dictionaries and merge these collections into a dictionary at the end. This would be more performing.
Upvotes: 21
Reputation: 16259
Although it doesn't use any pretty Linq, I think the following will be more efficient. It creates only one additional dictionary, which is the result. It is sized initially so that there will be no grows. In addition, the number of inserts will be exactly the number of elements in the result Dictionary.
I think this will be more efficient than creating several intermediary dictionaries or other collections, or doing things in a way that results in the new dictionary or intermediary dictionaries having to go through multiple growth resizes. In the middle foreach
, I don't know if it's more efficient to check against dict1
or result
for the ContainsKey
. I checked against dict1
because there is no need to check result
which will have more and more values from dict2
, and we know that no key in dict2
is in dict2
more than once.
var result = new Dictionary<MyKeyType, MyValueType>(dict1.Count + dict2.Count + dict3.Count
+ dict4.Count + dict5.Count);
foreach(var pair in dict1) {
result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict2) {
if (!dict1.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict3) {
if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict4) {
if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict5) {
if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
In a timing test with 5 large dictionaries having mostly unique keys between them, it worked out like this (times in milliseconds):
In the case of a key being in multiple dictionaries, the first value is the one that's used, because you haven't specified any specific way you want to handle that situation.
Upvotes: 2