jyotiska
jyotiska

Reputation: 281

Merge python dictionaries with common key

Assume I have the following dictionaries:

{name: "john", place: "nyc", owns: "gold", quantity: 30}
{name: "john", place: "nyc", owns: "silver", quantity: 20}
{name: "jane", place: "nyc", owns: "platinum", quantity: 5}
{name: "john", place: "chicago", owns: "brass", quantity: 60}
{name: "john", place: "chicago", owns: "silver", quantity: 40}

And I have hundreds of these small dictionaries. I have to merge them with a subset of common keys, in this example (name, place) and create a new dictionary. Ultimately, the output should look like the following:

{name: "john", place: "nyc", gold: 30, silver: 20}
{name: "jane", place: "nyc", platinum: 5}
{name: "john", place: "chicago", brass: 60, silver: 40}

Is there any efficient way to do this? All I can think of is brute-force, where I will keep track of every possible name-place combination, store in some list, traverse the entire thing again for each combination and merge the dictionaries into a new one. Thanks!

Upvotes: 3

Views: 638

Answers (5)

roippi
roippi

Reputation: 25954

First, getting the output that you asked for:

data = [{'name': "john", 'place': "nyc", 'owns': "gold", 'quantity': 30},
{'name': "john", 'place': "nyc", 'owns': "silver", 'quantity': 20},
{'name': "jane", 'place': "nyc", 'owns': "platinum", 'quantity': 5},
{'name': "john", 'place': "chicago", 'owns': "brass", 'quantity': 60},
{'name': "john", 'place': "chicago", 'owns': "silver", 'quantity': 40}]

from collections import defaultdict

accumulator = defaultdict(list)

for p in data:
    accumulator[p['name'],p['place']].append((p['owns'],p['quantity']))

from itertools import chain

[dict(chain([('name',name), ('place',place)], rest)) for (name,place),rest in accumulator.iteritems()]
Out[13]: 
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
 {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
 {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]

Now I have to point out that this list-of-dicts data structure you've asked for is super awkward. Dicts are great for lookups, but they perform best when you can just use one for the whole group of objects - if you have to linearly search through a bunch of dicts to find the one you want, you've immediately lost the whole benefit that dict provides in the first place. So that leaves us with a couple of options. Go one level deeper - nest dicts within our dict, or use something else entirely.

May I suggest making a list of meaningful objects which each represent one of these people? Either create your own class, or use a namedtuple:

from collections import namedtuple

Person = namedtuple('Person','name place holdings')

[Person(name, place, dict(rest)) for (name,place), rest in accumulator.iteritems()]
Out[17]: 
[Person(name='jane', place='nyc', holdings={'platinum': 5}),
 Person(name='john', place='chicago', holdings={'brass': 60, 'silver': 40}),
 Person(name='john', place='nyc', holdings={'silver': 20, 'gold': 30})]

Upvotes: 7

Sufian Latif
Sufian Latif

Reputation: 13356

May be a crazy idea, but how about a dict-of-dicts-of-dicts? This would work like a 2D array, the row and column indices being the names and places.

my_dicts = [
    {"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
    {"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
    {"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
    {"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
    {"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]

all_names = set(d["name"] for d in my_dicts)
all_places = set(d["place"] for d in my_dicts)

merged = {name : {place : {} for place in all_places} for name in all_names}

for d in my_dicts:
    merged[d["name"]][d["place"]][d["owns"]] = d["quantity"]

import pprint
pprint.pprint(merged)

# {'jane': {'chicago': {}, 'nyc': {'platinum': 5}},
#  'john': {'chicago': {'brass': 60, 'silver': 40},
#           'nyc': {'gold': 30, 'silver': 20}}}

Then convert to your desired format:

new_dicts = [{"name" : name, "place" : place} for name in all_names for place in all_places if merged[name][place]]
for d in new_dicts:
    d.update(merged[d["name"]][d["place"]])
pprint.pprint(new_dicts)

# [{'name': 'jane', 'place': 'nyc', 'platinum': 5},
#  {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
#  {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40}]

Upvotes: 0

thefourtheye
thefourtheye

Reputation: 239443

from itertools import groupby
result, get_owns = [], lambda x: x["owns"]
get_details =  lambda x: (x["name"], x["place"])

# Sort and group the data based on name and place
for key, grp in groupby(sorted(data, key=get_details), key=get_details):

    # Create a dictionary with the name and place
    temp = dict(zip(("name", "place"), key))

    # Sort and group the grouped data based on owns
    for owns, grp1 in groupby(sorted(grp, key=get_owns), key=get_owns):

        # For each material, find and add the sum of quantity in temp
        temp[owns] = sum(item["quantity"] for item in grp1)

    # Add the temp dictionary to the result :-)
    result.append(temp)
print result

Output

[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
 {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
 {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]

Upvotes: 0

shaktimaan
shaktimaan

Reputation: 12092

This is one way to do it:

dicts = [
    {"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
    {"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
    {"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
    {"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
    {"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]

We create a transformed dict with place-name as key and output dict as the value

transformed_dict = {}
for a_dict in dicts:
    key = '{}-{}'.format(a_dict['place'], a_dict['name'])
    if key not in transformed_dict:
        transformed_dict[key] = {'name': a_dict['name'], 'place': a_dict['place'], a_dict['owns']: a_dict['quantity']}
    else:
        transformed_dict[key][a_dict['owns']] = a_dict['quantity']

transformed_dict now looks like:

{'chicago-john': {'brass': 60,
                  'name': 'john',
                  'place': 'chicago',
                  'silver': 40},
 'nyc-jane': {'name': 'jane', 'place': 'nyc', 'platinum': 5},
 'nyc-john': {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}}

pprint(list(transformed_dict.values())) gives what we want:

[{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
 {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
 {'name': 'jane', 'place': 'nyc', 'platinum': 5}]

Upvotes: 0

Bryan
Bryan

Reputation: 6752

So my personal strategy for this is roughly outlined below. You should define a key generator given an instance of a dict, and then group it in an isolated dict by that key generated. Once you've iterated through all elements and updated based on the key, then simply return the .values() of the grouped dict.

dicts = [
    {"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
    {"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
    {"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
    {"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
    {"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]

def get_key(instance):
    return "%s-%s" % (instance.get("name"), instance.get("place"), )

grouped = {}

for dict_ in dicts:
    grouped[get_key(dict_)] = grouped.get(get_key(dict_), {})
    grouped[get_key(dict_)].update(dict_)

print grouped.values()
# [
#   {'owns': 'platinum', 'place': 'nyc', 'name': 'jane', 'quantity': 5},
#   {'name': 'john', 'place': 'nyc', 'owns': 'silver', 'quantity': 20}, 
#   {'name': 'john', 'place': 'chicago', 'owns': 'silver', 'quantity': 40}
# ]

Upvotes: 1

Related Questions