David
David

Reputation: 3066

Merge multiple dictionaries by joining values in a list, when they have the same key

Suppose you have a list of dictionaries like in this example:

[{a:1, b:2},
{a:1, b:3},
{a:2, b:2}]

If two entries of the same key a have the same value in the dictionary, I want b to be a list of 2 and 3:

[{a:1, b:[2,3]}, {a:2, b:2}]

Upvotes: 1

Views: 145

Answers (5)

westandskif
westandskif

Reputation: 982

here is how convtools library can help you with it:

from convtools import conversion as c

converter = (
    c.group_by(c.item("a"))
    .aggregate(
        {
            "a": c.item("a"),
            "b": c.ReduceFuncs.Array(c.item("b")).pipe(
                c.if_(c.call_func(len, c.this()) == 1, c.item(0), c.this())
            ),
        }
    )
    .gen_converter()
)

converter([{"a": 1, "b": 2}, {"a": 1, "b": 3}, {"a": 2, "b": 2}])

Upvotes: 0

Alain T.
Alain T.

Reputation: 42129

You can use a temporary dictionary to group the 'b' values for each 'a' value (no sorting required). Then build your resulting list from this temporary dictionary:

L1 = [{'a':1, 'b':2},
     {'a':1, 'b':3},
     {'a':2, 'b':2}]

D = dict()
for d in L1: D.setdefault(d['a'],[]).append(d['b'])
L2  = [ {'a':k,'b':v if len(v)>1 else v[0]} for k,v in D.items() ]

print(L2)
[{'a': 1, 'b': [2, 3]}, {'a': 2, 'b': 2}]

Note that it is generally not a good idea to have different types for the same key in regular data structures(i.e. sometimes int sometimes list) because all the code that uses that will need to make special cases of the data type. (you should have all 'b' values be lists even if some of them only have 1 item)

Upvotes: 2

piterbarg
piterbarg

Reputation: 8219

Pretty simple using pandas and groupby:

import pandas as pd
dl = [{'a':1, 'b':2},
{'a':1, 'b':3},
{'a':2, 'b':2}]
df = pd.DataFrame.from_records(dl)
df2 = df.groupby('a')['b'].agg(list).reset_index()
df2.to_dict(orient = 'records')

produces

[{'a': 1, 'b': [2, 3]}, {'a': 2, 'b': [2]}]

Edit

If single values should not be turned into lists then the df2=... line should be replaced with:

...
df2 = df.groupby('a')['b'].apply(lambda c:list(c) if len(c)>1 else c.iloc[0]).reset_index()
...

Upvotes: 2

ddejohn
ddejohn

Reputation: 8960

Alright, here's a pure Python solution. Notice that the end result does not place non-collided b values in a list, which it seems like is what you want. For what it's worth, this problem becomes much easier if you allow for all b keys to point to lists, but to each their own.

Suppose you have this list of dictionaries:

In [1]: d_list
Out[1]:
[{'a': 2, 'b': 1},
 {'a': 1, 'b': 4},
 {'a': 2, 'b': 1},
 {'a': 2, 'b': 4},
 {'a': 4, 'b': 2},
 {'a': 1, 'b': 3},
 {'a': 1, 'b': 1}]

First, group them by the values of a:

In [2]: groups = {}
   ...: for d in d_list:
   ...:     for key, val in d.items():
   ...:         if key == "a":
   ...:             groups[(key, val)] = groups.get((key, val), []) + [d]
   ...:

In [3]: groups
Out[3]:
{('a', 2): [{'a': 2, 'b': 1}, {'a': 2, 'b': 1}, {'a': 2, 'b': 4}],
 ('a', 1): [{'a': 1, 'b': 4}, {'a': 1, 'b': 3}, {'a': 1, 'b': 1}],
 ('a', 4): [{'a': 4, 'b': 2}]}

Then create a new_d_list and iterate over each group:

In [4]: new_d_list = []
   ...: for (a_key, a_val), d_list in groups.items():
   ...:     b_val = [d["b"] for d in d_list] if len(d_list) > 1 else d_list[0]["b"]
   ...:     new_d = {a_key: a_val, "b": b_val}
   ...:     new_d_list.append(new_d)
   ...:

Output:

In [5]: new_d_list
Out[5]:
[{'a': 2, 'b': [1, 1, 4]},
 {'a': 1, 'b': [4, 3, 1]},
 {'a': 4, 'b': 2}]

Upvotes: 4

Alex Waygood
Alex Waygood

Reputation: 7579

You can do this using itertools.groupby.

from itertools import groupby
from operator import itemgetter

a, b = 'a', 'b'
original_list = [{a:1, b:2}, {a:1, b:3}, {a:2, b:2}]
new_list = []

for key, group in groupby(original_list, key=itemgetter(a)):
    group_list = list(group)

    if len(group_list) > 1:
        b_val = [d[b] for d in group_list]
    else:
        b_val = group_list[0]

    new_list.append({a: key, b: b_val})

Another possible solution would be to use collections.defaultdict:

from collections import defaultdict

a, b = 'a', 'b'
original_list = [{a:1, b:2}, {a:1, b:3}, {a:2, b:2}]
d = defaultdict(list)

for dictionary in original_list:
    d[dictionary[a]].append(dictionary[b])

new_list = [
    {a: key, b: (val if len(val) > 1 else val[0])}
    for key, val in d.items()
]

Upvotes: 3

Related Questions