Reputation: 7597
I have two lists of dictionaries, and I need to merge them when ever USA
and GOOG
are the same.
list1 =
[{'USA': 'Eastern',
'GOOG': '2019',
'Up': {'Upfront': 45},
'Right': {'Upfront': 12}},
{'USA': 'Western',
'GOOG': '2019',
'Up': {'Upfront': 10},
'Right': {'Upfront': 15}}]
list2=
[{'USA': 'Western',
'GOOG': '2019',
'Down': {'Downback': 35},
'Right': {'Downback': 25}},
{'USA': 'Eastern',
'GOOG': '2018',
'Down': {'Downback': 15},
'Right': {'Downback': 55}}]
Since USA
and GOOG
had same values for 2nd element in list1
and 1st element in list2
, so they should be merged. The result expected is as follows -
Result =
[{'USA': 'Eastern',
'GOOG': '2019',
'Up': {'Upfront': 45},
'Right': {'Upfront': 12}},
{'USA': 'Western',
'GOOG': '2019',
'Up': {'Upfront': 10},
'Down': {'Downback': 35},
'Right': {'Upfront': 15, 'Downback': 25}},
{'USA': 'Eastern',
'GOOG': '2018',
'Down': {'Downback': 15},
'Right': {'Downback': 55}}]
How can we write a generic code for this. I tried using defaultdict, but did not know how to concatenate an arbitrary number of rest of dictionaries.
My attempt:
from collections import defaultdict
dics = list1+list2
for dic in dics:
for key, val in dic.items():
dd[key].append(val)
for dic in dics:
for key, val in dic.items():
dd[key].append(val)
Upvotes: 2
Views: 1843
Reputation: 2152
list1 = [{'USA': 'Eastern',
'GOOG': '2019',
'Up': {'Upfront': 45},
'Right': {'Upfront': 12}},
{'USA': 'Western',
'GOOG': '2019',
'Up': {'Upfront': 10},
'Right': {'Upfront': 15}}]
list2=[{'USA': 'Western',
'GOOG': '2019',
'Down': {'Downback': 35},
'Right': {'Downback': 25}},
{'USA': 'Eastern',
'GOOG': '2018',
'Down': {'Downback': 15},
'Right': {'Downback': 55}}]
def mergeDicts(d1,d2):
for k,v in d2.items():
if k in d1:
if isinstance(v,dict):
mergeDicts(d1[k], v)
else: d1[k]=v
else: d1[k]=v
def merge_lists(list1, list2):
merged_list = []
for d1 in list1:
for d2 in list2:
if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
mergeDicts(d1, d2)
merged_list.append(d1)
break
else:
merged_list.append(d1)
for d2 in list2:
for d1 in list1:
if d1['USA'] == d2['USA'] and d1['GOOG'] == d2['GOOG']:
break
else:
merged_list.append(d2)
return merged_list
res1 = merge_lists(list1, list2)
print(res1)
"""
[{'USA': 'Eastern', 'GOOG': '2019', 'Up': {'Upfront': 45}, 'Right': {'Upfront': 12}},
{'USA': 'Western', 'GOOG': '2019', 'Up': {'Upfront': 10},
'Right': {'Upfront': 15, 'Downback': 25},
'Down': {'Downback': 35}},
{'USA': 'Eastern', 'GOOG': '2018', 'Down': {'Downback': 15}, 'Right': {'Downback': 55}}]
"""
Upvotes: 0
Reputation: 1899
Here is my attempt at a solution. It manages to reproduce the results you requested. Please ignore how badly named my variables are. I found this problem quite interesting.
def joinListByDictionary(list1, list2):
"""Join lists on USA and GOOG having the same value"""
list1.extend(list2)
matchIndx = []
matches = []
for dicts in range(len(list1)):
for dicts2 in range(len(list1)):
if dicts == dicts2:
continue
if list1[dicts]["GOOG"] == list1[dicts2]["GOOG"] and list1[dicts]["USA"] == list1[dicts2]["USA"]:
matches.append(list1[dicts])
matchIndx.append(dicts)
for dictz in matches:
for dictzz in matches:
for key in dictz.keys():
if key in dictzz.keys() and isinstance(dictzz[key], dict):
dictzz[key].update(dictz[key])
matches.remove(dictz)
newList = [list1[ele] for ele in range(len(list1)) if ele not in matchIndx]
newList.extend(matches)
print newList
return newList
joinListByDictionary(list1, list2)
Upvotes: 1
Reputation: 110506
There are two algorithmic tasks in what you need: find the records that have the same values for USA and GOOGL, and then joining then and do that in a way that if the same key exists in both records, their value is merged.
The naive approach for the first would be to have a for loop that would iterate the values of list1, and for each value, iterate all values for list2 - two separated loops won't cut it, you'd need two nested for
loops:
for element in list1:
for other_element in list2:
if ...:
...
While this approach would work, and is fine for small lists (<1000 records, for example), it takes an amount of time and resources that are proportional to the square of your list sizes - that is, for lists that are close to ~1000 items we are talking 1 million iterations. If the lists are thenselves 1.000.000 items, the computation would take 1 * 10^12 comparisons, and that is not feasible in today's computers at all.
So, a nice solution is to re-create one of the lists in a way that the comparison key is used as a hash -that is done by copying the list to a dictionary where the keys are the values you want to compare, and then iterate on the second list just once. As dictionaries have a constant time to find items, that will make the number of comparisons be proportional to your list sizes.
The second part of your task is to compare to copy one record to a result list, and update the keys on the resulting copy so that any duplciate keys are merged. To avoid a problem when copying the first records, we are safer using Python's copy.deepcopy
, which will ensure the sub-dictionaries are different objects than the ones in the original record, and will stay isolated.
from copy import deepcopy
def merge_lists(list1, list2):
# create dictionary from list1:
dict1 = {(record["GOOG"], record["USA"]): record for record in list1}
#compare elements in list2 to those on list1:
result = {}
for record in list2:
ckey = record["GOOG"], record["USA"]
new_record = deepcopy(record)
if ckey in dict1:
for key, value in dict1[ckey].items():
if key in ("GOOG", "USA"):
# Do not merge these keys
continue
# Dict's "setdefault" finds a key/value, and if it is missing
# creates a new one with the second parameter as value
new_record.setdefault(key, {}).update(value)
result[ckey] = new_record
# Add values from list1 that were not matched in list2:
for key, value in dict1.items():
if key not in result:
result[key] = deepcopy(value)
return list(result.values())
Upvotes: 2
Reputation: 2836
Here is my attempt. Not sure if this is the best way, but it's a start.
Steps:
Code:
import operator as op
import itertools as it
from functools import reduce
from pprint import pprint
dictionaries = reduce(op.add, (list1, list2,))
groups = it.groupby(sorted([(op.itemgetter('USA', 'GOOG')(d), i)
for i, d in enumerate(dictionaries)]),
key=op.itemgetter(0))
results = []
for key, group in groups:
_, indices = zip(*group)
if len(indices) == 1:
i, = indices
results.append(dictionaries[i])
else:
merge = dictionaries[indices[0]]
for i in indices[1:]:
merge.update(dictionaries[i])
results.append(merge)
pprint(results, indent=4)
OUTPUT:
[ { 'Down': {'Downback': 15}, 'GOOG': '2018', 'Right': {'Downback': 55}, 'USA': 'Eastern'}, { 'GOOG': '2019', 'Right': {'Upfront': 12}, 'USA': 'Eastern', 'Up': {'Upfront': 45}}, { 'Down': {'Downback': 35}, 'GOOG': '2019', 'Right': {'Downback': 25}, 'USA': 'Western', 'Up': {'Upfront': 10}}]
Upvotes: 1