Reputation: 1628
I want to merge two lists of dictionaries, using multiple keys.
I have a single list of dicts with one set of results:
l1 = [{'id': 1, 'year': '2017', 'resultA': 2},
{'id': 2, 'year': '2017', 'resultA': 3},
{'id': 1, 'year': '2018', 'resultA': 3},
{'id': 2, 'year': '2018', 'resultA': 5}]
And another list of dicts for another set of results:
l2 = [{'id': 1, 'year': '2017', 'resultB': 5},
{'id': 2, 'year': '2017', 'resultB': 8},
{'id': 1, 'year': '2018', 'resultB': 7},
{'id': 2, 'year': '2018', 'resultB': 9}]
And I want to combine them using the 'id' and 'year' keys to get the following:
all = [{'id': 1, 'year': '2017', 'resultA': 2, 'resultB': 5},
{'id': 2, 'year': '2017', 'resultA': 3, 'resultB': 8},
{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]
I know that for combining two lists of dicts on a single key, I can use this:
l1 = {d['id']:d for d in l1}
all = [dict(d, **l1.get(d['id'], {})) for d in l2]
But it ignores the year, providing the following incorrect result:
all = [{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 5},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 8},
{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]
Treating this as I would in R, by adding in the second variable I want to merge on, I get a KeyError:
l1 = {d['id','year']:d for d in l1}
all = [dict(d, **l1.get(d['id','year'], {})) for d in l2]
How do I merge using multiple keys?
Upvotes: 1
Views: 2056
Reputation: 5676
You can combine both list and groupby the resulting list on id
and year
. Then merge the dict together that have same keys.
Grouping can be achieved by using itertools.groupby
, and merge can be done using collection.ChainMap
>>> from itertools import groupby
>>> from collections import ChainMap
>>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=lambda x: (x['id'],x['year'])),key=lambda x: (x['id'],x['year']))]
>>> [{'resultA': 2, 'id': 1, 'resultB': 5, 'year': '2017'}, {'resultA': 3, 'id': 1, 'resultB': 7, 'year': '2018'}, {'resultA': 3, 'id': 2, 'resultB': 8, 'year': '2017'}, {'resultA': 5, 'id': 2, 'resultB': 9, 'year': '2018'}]
Alternatively to avoid lambda
you can also use operator.itemgetter
>>> from operator import itemgetter
>>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=itemgetter('id', 'year')),key=itemgetter('id', 'year'))]
Upvotes: 2
Reputation: 164843
Expanding on @AlexHall's suggestion, you can use collections.defaultdict
to help you:
from collections import defaultdict
d = defaultdict(dict)
for i in l1 + l2:
results = {k: v for k, v in i.items() if k not in ('id', 'year')}
d[(i['id'], i['year'])].update(results)
Result
defaultdict(dict,
{(1, '2017'): {'resultA': 2, 'resultB': 5},
(1, '2018'): {'resultA': 3, 'resultB': 7},
(2, '2017'): {'resultA': 3, 'resultB': 8},
(2, '2018'): {'resultA': 5, 'resultB': 9}})
Upvotes: 0
Reputation: 36053
Instead of d['id','year']
, use the tuple (d['id'], d['year'])
as your key.
Upvotes: 5