Thirst for Knowledge
Thirst for Knowledge

Reputation: 1628

Merge list of python dictionaries using multiple keys

I want to merge two lists of dictionaries, using multiple keys.

I have a single list of dicts with one set of results:

l1 = [{'id': 1, 'year': '2017', 'resultA': 2},
      {'id': 2, 'year': '2017', 'resultA': 3},
      {'id': 1, 'year': '2018', 'resultA': 3},
      {'id': 2, 'year': '2018', 'resultA': 5}]

And another list of dicts for another set of results:

l2 = [{'id': 1, 'year': '2017', 'resultB': 5},
      {'id': 2, 'year': '2017', 'resultB': 8},
      {'id': 1, 'year': '2018', 'resultB': 7},
      {'id': 2, 'year': '2018', 'resultB': 9}]

And I want to combine them using the 'id' and 'year' keys to get the following:

all = [{'id': 1, 'year': '2017', 'resultA': 2, 'resultB': 5},
       {'id': 2, 'year': '2017', 'resultA': 3, 'resultB': 8},
       {'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
       {'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]

I know that for combining two lists of dicts on a single key, I can use this:

l1 = {d['id']:d for d in l1} 

all = [dict(d, **l1.get(d['id'], {})) for d in l2]  

But it ignores the year, providing the following incorrect result:

all = [{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 5},
       {'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 8},
       {'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
       {'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]

Treating this as I would in R, by adding in the second variable I want to merge on, I get a KeyError:

l1 = {d['id','year']:d for d in l1} 

all = [dict(d, **l1.get(d['id','year'], {})) for d in l2]   

How do I merge using multiple keys?

Upvotes: 1

Views: 2056

Answers (3)

Sohaib Farooqi
Sohaib Farooqi

Reputation: 5676

You can combine both list and groupby the resulting list on id and year. Then merge the dict together that have same keys.

Grouping can be achieved by using itertools.groupby, and merge can be done using collection.ChainMap

>>> from itertools import groupby
>>> from collections import ChainMap

>>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=lambda x: (x['id'],x['year'])),key=lambda x: (x['id'],x['year']))]
>>> [{'resultA': 2, 'id': 1, 'resultB': 5, 'year': '2017'}, {'resultA': 3, 'id': 1, 'resultB': 7, 'year': '2018'}, {'resultA': 3, 'id': 2, 'resultB': 8, 'year': '2017'}, {'resultA': 5, 'id': 2, 'resultB': 9, 'year': '2018'}]

Alternatively to avoid lambda you can also use operator.itemgetter

 >>> from operator import itemgetter
 >>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=itemgetter('id', 'year')),key=itemgetter('id', 'year'))]

Upvotes: 2

jpp
jpp

Reputation: 164843

Expanding on @AlexHall's suggestion, you can use collections.defaultdict to help you:

from collections import defaultdict

d = defaultdict(dict)

for i in l1 + l2:
    results = {k: v for k, v in i.items() if k not in ('id', 'year')}
    d[(i['id'], i['year'])].update(results)

Result

defaultdict(dict,
            {(1, '2017'): {'resultA': 2, 'resultB': 5},
             (1, '2018'): {'resultA': 3, 'resultB': 7},
             (2, '2017'): {'resultA': 3, 'resultB': 8},
             (2, '2018'): {'resultA': 5, 'resultB': 9}})

Upvotes: 0

Alex Hall
Alex Hall

Reputation: 36053

Instead of d['id','year'], use the tuple (d['id'], d['year']) as your key.

Upvotes: 5

Related Questions