Pytharelle
Pytharelle

Reputation: 73

How to merge dicts when sharing common value?

I currently have the following data taken off my database (list of 4 dicts):

raw_data = [{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}]

I would like to automatically merge each dictionary sharing the same "budget_id" so that in the end I end up with the following (list of 2 dicts) :

final_data = [{'name': ['Maria'], 'cost': [195, 325], 'gift': ['Phone', 'Flower'], 'budget_id': [1], 'user': ['10'], 'amount': [980]}, {{'name': ['Scott'], 'cost': [599, 1200], 'gift': ['Paris Trip', 'Ring'], 'budget_id': [2], 'user': ['10'], 'amount': [2100]}

The idea is that it should not be hardcoded as the amount of incoming dictionary from the website will gradually move and reduce overtime.

I so far have managed to find a hardcoded solution allowing me to merge the first and second dictionary successfully :

Merge_Init = {}
for key in (fibud.keys() | sebud.keys()):
    if key in final_data[0]: Merge_Init.setdefault(key, []).append(final_data[0][key])
    if key in final_data[1]: Merge_Init.setdefault(key, []).append(final_data[1][key])
print(Merge_Init)

final_merge = {a:list(set(b)) for a, b in Merge_Init.items()}
print("New Dict without Duplicates:", final_merge)

Resulting in :

New Dict without Duplicates: {'user': ['10'], 'gift': ['Phone', 'Flower'], 'name': ['Maria'], 'amount': [980], 'budget_id': [1], 'cost': [195, 325]}

Note that on this occasion, the cost of the gifts inverted themselves somewhat, the cost should be [325, 195].

Can you please assist in finding a solution to loop through each budget and merge them when they share the same budget_id without any hardcoded solution like "final_data[0]" while keeping data integrity?

Thanks a lot !

Upvotes: 2

Views: 99

Answers (3)

Deepak Tripathi
Deepak Tripathi

Reputation: 3233

You can use pandas for the same like this :

import pandas as pd

df = pd.DataFrame([{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}])
df = df.groupby('budget_id').agg({'name': set,
                                   'amount': set,
                                   'cost': set ,
                                   'user': set,
                                   'gift': set}).reset_index()
print(df.to_dict('records'))

Upvotes: 2

atru
atru

Reputation: 4744

Here is one solution, with raw_data extended for a better demonstration:

raw_data = [{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Watch', 'cost': 240}]

final_data = []

for entry in raw_data:

    found = False
    for ind, final in enumerate(final_data):
        # Look if the budget entry already exists
        if entry['budget_id'] in final['budget_id']:
            found = True
            break

    if found:
        # Merge
        # Everything - issue if any entry just happens 
        # to be the same (like cost)
        #for key, value in entry.items():
            #if not (entry[key] in final[key]):
            #   final_data[ind][key].append(entry[key])

        # Alternative - specific entries only
        final_data[ind]['gift'].append(entry['gift'])
        final_data[ind]['cost'].append(entry['cost'])
    else:
        # If not yet there - add it as a new item, converting 
        # all values to lists
        final_data.append({x:[y] for x,y in entry.items()})

print(final_data)

The code loops through all dictionaries in the raw_data loop. For each dictionary, it then loops through all existing entries of final_data keeping track of the index with enumerate. Using the budget_id it checks if the budget entry was already encountered and stored. If that's the case, it sets an appropriate flag and breaks the loop.

In the second part, if the entry was not yet encountered, it is appended to the final_data list, as a dictionary with all its values converted to lists.

If it was already there - the data is merged. Here there are two options, one, the commented out one, - it merges everything together if the values are not the same/yet present. This is not good for things like item prices that can easily be repetitive but I left it for completeness.

In the second, current version it simply looks for the specific items-keys and merges with them. This assumes that even if there are duplicates in gifts, they should be included.

Upvotes: 0

Nikola Dim
Nikola Dim

Reputation: 846

This solution doesn't hardcode any element of array, but since you have different requiremenst for different keys, then those keys need to be hardcoded in order to handle them properly. In your example after merging you had only ['Maria'], while if you had two items cost the same, surely you would expect the cost to be [150, 150], and not [150].

final = {} # key is budget_id, and value is the all dictionaries merged 
for dict in list:
   budget_id = dict['budget_id']
   if budget_id in final:
        # for each key you'll do something like this
        dictToModify = final[budget_id]
        dictToModify.append(dict['gift'])
        # for each key in dictionary, add it to the list
        # some will be added always to the list: e.g. cost
        # some will be added only once, e.g. name and budget_id

   else:
       # here you're just putting everything in a list as in your final_data example
       final[ budget_id ] = {key:[value] for (key,value) in dict.items()}

final_data = list(final.values())

This loop adds elements to final dictionary if they don't exist there, or merges the if an element with same budget_id is already present. The last step of the loop is to convert it to a list of dictionaries.

Upvotes: 1

Related Questions