Reputation: 73
I currently have the following data taken off my database (list of 4 dicts):
raw_data = [{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}]
I would like to automatically merge each dictionary sharing the same "budget_id" so that in the end I end up with the following (list of 2 dicts) :
final_data = [{'name': ['Maria'], 'cost': [195, 325], 'gift': ['Phone', 'Flower'], 'budget_id': [1], 'user': ['10'], 'amount': [980]}, {{'name': ['Scott'], 'cost': [599, 1200], 'gift': ['Paris Trip', 'Ring'], 'budget_id': [2], 'user': ['10'], 'amount': [2100]}
The idea is that it should not be hardcoded as the amount of incoming dictionary from the website will gradually move and reduce overtime.
I so far have managed to find a hardcoded solution allowing me to merge the first and second dictionary successfully :
Merge_Init = {}
for key in (fibud.keys() | sebud.keys()):
if key in final_data[0]: Merge_Init.setdefault(key, []).append(final_data[0][key])
if key in final_data[1]: Merge_Init.setdefault(key, []).append(final_data[1][key])
print(Merge_Init)
final_merge = {a:list(set(b)) for a, b in Merge_Init.items()}
print("New Dict without Duplicates:", final_merge)
Resulting in :
New Dict without Duplicates: {'user': ['10'], 'gift': ['Phone', 'Flower'], 'name': ['Maria'], 'amount': [980], 'budget_id': [1], 'cost': [195, 325]}
Note that on this occasion, the cost of the gifts inverted themselves somewhat, the cost should be [325, 195].
Can you please assist in finding a solution to loop through each budget and merge them when they share the same budget_id without any hardcoded solution like "final_data[0]" while keeping data integrity?
Thanks a lot !
Upvotes: 2
Views: 99
Reputation: 3233
You can use pandas for the same like this :
import pandas as pd
df = pd.DataFrame([{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}])
df = df.groupby('budget_id').agg({'name': set,
'amount': set,
'cost': set ,
'user': set,
'gift': set}).reset_index()
print(df.to_dict('records'))
Upvotes: 2
Reputation: 4744
Here is one solution, with raw_data
extended for a better demonstration:
raw_data = [{'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Phone', 'cost': 325}, {'budget_id': 1, 'name': 'Maria', 'amount': 980, 'user': '10', 'gift': 'Flower', 'cost': 195}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Paris Trip', 'cost': 599}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Ring', 'cost': 1200}, {'budget_id': 2, 'name': 'Scott', 'amount': 2100, 'user': '10', 'gift': 'Watch', 'cost': 240}]
final_data = []
for entry in raw_data:
found = False
for ind, final in enumerate(final_data):
# Look if the budget entry already exists
if entry['budget_id'] in final['budget_id']:
found = True
break
if found:
# Merge
# Everything - issue if any entry just happens
# to be the same (like cost)
#for key, value in entry.items():
#if not (entry[key] in final[key]):
# final_data[ind][key].append(entry[key])
# Alternative - specific entries only
final_data[ind]['gift'].append(entry['gift'])
final_data[ind]['cost'].append(entry['cost'])
else:
# If not yet there - add it as a new item, converting
# all values to lists
final_data.append({x:[y] for x,y in entry.items()})
print(final_data)
The code loops through all dictionaries in the raw_data
loop. For each dictionary, it then loops through all existing entries of final_data
keeping track of the index with enumerate
. Using the budget_id
it checks if the budget entry was already encountered and stored. If that's the case, it sets an appropriate flag and breaks the loop.
In the second part, if the entry was not yet encountered, it is appended to the final_data
list, as a dictionary with all its values converted to lists.
If it was already there - the data is merged. Here there are two options, one, the commented out one, - it merges everything together if the values are not the same/yet present. This is not good for things like item prices that can easily be repetitive but I left it for completeness.
In the second, current version it simply looks for the specific items-keys and merges with them. This assumes that even if there are duplicates in gifts
, they should be included.
Upvotes: 0
Reputation: 846
This solution doesn't hardcode any element of array, but since you have different requiremenst for different keys, then those keys need to be hardcoded in order to handle them properly. In your example after merging you had only ['Maria'], while if you had two items cost the same, surely you would expect the cost to be [150, 150], and not [150].
final = {} # key is budget_id, and value is the all dictionaries merged
for dict in list:
budget_id = dict['budget_id']
if budget_id in final:
# for each key you'll do something like this
dictToModify = final[budget_id]
dictToModify.append(dict['gift'])
# for each key in dictionary, add it to the list
# some will be added always to the list: e.g. cost
# some will be added only once, e.g. name and budget_id
else:
# here you're just putting everything in a list as in your final_data example
final[ budget_id ] = {key:[value] for (key,value) in dict.items()}
final_data = list(final.values())
This loop adds elements to final dictionary if they don't exist there, or merges the if an element with same budget_id is already present. The last step of the loop is to convert it to a list of dictionaries.
Upvotes: 1