Reputation: 996
So I have a list with several dictionaries, they all have the same keys. Some dictionaries are the same but one value is different. How could I merge them into 1 dictionary having that different values as array?
Let me give you an example:
let's say I have this dictionaries
[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]
My desired output would be this:
[{'a':1, 'b':2,'c':[3,4]},{'a':1, 'b':3,'c':[3,4]}]
I've tried using for
and if
nested, but it's too expensive and nasty, and I'm sure there must be a better way. Could you give me a hand?
How could I do that for any kind of dictionary assuming that the amount of keys is the same on the dictionaries and knowing the name of the key to be merged as array (c
in this case)
thanks!
Upvotes: 3
Views: 1058
Reputation: 26315
Use a collections.defaultdict
to group the c
values by a
and b
tuple keys:
from collections import defaultdict
lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]
d = defaultdict(list)
for x in lst:
d[x["a"], x["b"]].append(x["c"])
result = [{"a": a, "b": b, "c": c} for (a, b), c in d.items()]
print(result)
Could also use itertools.groupby
if lst
is already ordered by a
and b
:
from itertools import groupby
from operator import itemgetter
lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]
result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(lst, key=itemgetter("a", "b"))
]
print(result)
Or if lst
is not ordered by a
and b
, we can sort by those two keys as well:
result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(
sorted(lst, key=itemgetter("a", "b")), key=itemgetter("a", "b")
)
]
print(result)
Output:
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
For a more generic solution for any amount of keys:
def merge_lst_dicts(lst, keys, merge_key):
groups = defaultdict(list)
for item in lst:
key = tuple(item.get(k) for k in keys)
groups[key].append(item.get(merge_key))
return [
{**dict(zip(keys, group_key)), **{merge_key: merged_values}}
for group_key, merged_values in groups.items()
]
print(merge_lst_dicts(lst, ["a", "b"], "c"))
# [{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
Upvotes: 3
Reputation: 4587
You could use a temp dict to solve this problem -
>>>python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
>>> di=[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]
>>> from collections import defaultdict as dd
>>> dt=dd(list) #default dict of list
>>> for d in di: #create temp dict with 'a','b' as tuple and append 'c'
... dt[d['a'],d['b']].append(d['c'])
>>> for k,v in dt.items(): #Create final output from temp
... ol.append({'a':k[0],'b':k[1], 'c':v})
...
>>> ol #output
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
If the number of keys in input dict is large, the process to extract tuple for temp_dict can be automated -
if the keys the define condition for merging are known than it can be simply a constant tuple eg.
keys=('a','b') #in this case, merging happens over these keys
If this is not known at until runtime, then we can get these keys using zip function and set difference, eg.
>>> di
[{'a': 1, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}, {'a': 1, 'b': 3, 'c': 3}, {'a': 1, 'b': 3, 'c': 4}]
>>> key_to_ignore_for_merge='c'
>>> keys=tuple(set(list(zip(*zip(*di)))[0])-set(key_to_ignore_for_merge))
>>> keys
('a', 'b')
At this point, we can use map to extract tuple for keys only-
>>> dt=dd(list)
>>> for d in di:
... dt[tuple(map(d.get,keys))].append(d[key_to_ignore_for_merge])
>>> dt
defaultdict(<class 'list'>, {(1, 2): [3, 4], (1, 3): [3, 4]})
Now, to recreate the dictionary from default_dict and keys will require some zip magic again!
>>> for k,v in dt.items():
... dtt=dict(tuple(zip(keys, k)))
... dtt[key_to_ignore_for_merge]=v
... ol.append(dtt)
...
>>> ol
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]
This solution assumes that you only know the keys that can be different (eg. 'c') and rest is all runtime.
Upvotes: 2