Capie
Capie

Reputation: 996

Merge several dictionaries creating array on different values

So I have a list with several dictionaries, they all have the same keys. Some dictionaries are the same but one value is different. How could I merge them into 1 dictionary having that different values as array?

Let me give you an example:

let's say I have this dictionaries

[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]

My desired output would be this:

[{'a':1, 'b':2,'c':[3,4]},{'a':1, 'b':3,'c':[3,4]}]

I've tried using for and if nested, but it's too expensive and nasty, and I'm sure there must be a better way. Could you give me a hand?

How could I do that for any kind of dictionary assuming that the amount of keys is the same on the dictionaries and knowing the name of the key to be merged as array (c in this case)

thanks!

Upvotes: 3

Views: 1058

Answers (2)

RoadRunner
RoadRunner

Reputation: 26315

Use a collections.defaultdict to group the c values by a and b tuple keys:

from collections import defaultdict

lst = [
    {"a": 1, "b": 2, "c": 3},
    {"a": 1, "b": 2, "c": 4},
    {"a": 1, "b": 3, "c": 3},
    {"a": 1, "b": 3, "c": 4},
]

d = defaultdict(list)
for x in lst:
    d[x["a"], x["b"]].append(x["c"])

result = [{"a": a, "b": b, "c": c} for (a, b), c in d.items()]

print(result)

Could also use itertools.groupby if lst is already ordered by a and b:

from itertools import groupby
from operator import itemgetter

lst = [
    {"a": 1, "b": 2, "c": 3},
    {"a": 1, "b": 2, "c": 4},
    {"a": 1, "b": 3, "c": 3},
    {"a": 1, "b": 3, "c": 4},
]

result = [
    {"a": a, "b": b, "c": [x["c"] for x in g]}
    for (a, b), g in groupby(lst, key=itemgetter("a", "b"))
]

print(result)

Or if lst is not ordered by a and b, we can sort by those two keys as well:

result = [
    {"a": a, "b": b, "c": [x["c"] for x in g]}
    for (a, b), g in groupby(
        sorted(lst, key=itemgetter("a", "b")), key=itemgetter("a", "b")
    )
]

print(result)

Output:

[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]

Update

For a more generic solution for any amount of keys:

def merge_lst_dicts(lst, keys, merge_key):
    groups = defaultdict(list)

    for item in lst:
        key = tuple(item.get(k) for k in keys)
        groups[key].append(item.get(merge_key))

    return [
        {**dict(zip(keys, group_key)), **{merge_key: merged_values}}
        for group_key, merged_values in groups.items()
    ]

print(merge_lst_dicts(lst, ["a", "b"], "c"))
# [{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]

Upvotes: 3

mahoriR
mahoriR

Reputation: 4587

You could use a temp dict to solve this problem -


>>>python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 

>>> di=[{'a':1, 'b':2,'c':3},{'a':1, 'b':2,'c':4},{'a':1, 'b':3,'c':3},{'a':1, 'b':3,'c':4}]
>>> from collections import defaultdict as dd
>>> dt=dd(list) #default dict of list
>>> for d in di: #create temp dict with 'a','b' as tuple and append 'c'
...     dt[d['a'],d['b']].append(d['c'])
>>> for k,v in dt.items(): #Create final output from temp
...     ol.append({'a':k[0],'b':k[1], 'c':v})
... 
>>> ol #output
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]

If the number of keys in input dict is large, the process to extract tuple for temp_dict can be automated -

if the keys the define condition for merging are known than it can be simply a constant tuple eg.

keys=('a','b') #in this case, merging happens over these keys

If this is not known at until runtime, then we can get these keys using zip function and set difference, eg.

>>> di
[{'a': 1, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}, {'a': 1, 'b': 3, 'c': 3}, {'a': 1, 'b': 3, 'c': 4}]
>>> key_to_ignore_for_merge='c'
>>> keys=tuple(set(list(zip(*zip(*di)))[0])-set(key_to_ignore_for_merge))
>>> keys
('a', 'b')

At this point, we can use map to extract tuple for keys only-

>>> dt=dd(list)
>>> for d in di:
...  dt[tuple(map(d.get,keys))].append(d[key_to_ignore_for_merge])
>>> dt
defaultdict(<class 'list'>, {(1, 2): [3, 4], (1, 3): [3, 4]})

Now, to recreate the dictionary from default_dict and keys will require some zip magic again!

>>> for k,v in dt.items():
...  dtt=dict(tuple(zip(keys, k)))
...  dtt[key_to_ignore_for_merge]=v
...  ol.append(dtt)
... 
>>> ol
[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]


This solution assumes that you only know the keys that can be different (eg. 'c') and rest is all runtime.

Upvotes: 2

Related Questions