Reputation: 281
Assume I have the following dictionaries:
{name: "john", place: "nyc", owns: "gold", quantity: 30}
{name: "john", place: "nyc", owns: "silver", quantity: 20}
{name: "jane", place: "nyc", owns: "platinum", quantity: 5}
{name: "john", place: "chicago", owns: "brass", quantity: 60}
{name: "john", place: "chicago", owns: "silver", quantity: 40}
And I have hundreds of these small dictionaries. I have to merge them with a subset of common keys, in this example (name, place) and create a new dictionary. Ultimately, the output should look like the following:
{name: "john", place: "nyc", gold: 30, silver: 20}
{name: "jane", place: "nyc", platinum: 5}
{name: "john", place: "chicago", brass: 60, silver: 40}
Is there any efficient way to do this? All I can think of is brute-force, where I will keep track of every possible name-place combination, store in some list, traverse the entire thing again for each combination and merge the dictionaries into a new one. Thanks!
Upvotes: 3
Views: 638
Reputation: 25954
First, getting the output that you asked for:
data = [{'name': "john", 'place': "nyc", 'owns': "gold", 'quantity': 30},
{'name': "john", 'place': "nyc", 'owns': "silver", 'quantity': 20},
{'name': "jane", 'place': "nyc", 'owns': "platinum", 'quantity': 5},
{'name': "john", 'place': "chicago", 'owns': "brass", 'quantity': 60},
{'name': "john", 'place': "chicago", 'owns': "silver", 'quantity': 40}]
from collections import defaultdict
accumulator = defaultdict(list)
for p in data:
accumulator[p['name'],p['place']].append((p['owns'],p['quantity']))
from itertools import chain
[dict(chain([('name',name), ('place',place)], rest)) for (name,place),rest in accumulator.iteritems()]
Out[13]:
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
Now I have to point out that this list-of-dicts data structure you've asked for is super awkward. Dicts are great for lookups, but they perform best when you can just use one for the whole group of objects - if you have to linearly search through a bunch of dicts to find the one you want, you've immediately lost the whole benefit that dict
provides in the first place. So that leaves us with a couple of options. Go one level deeper - nest dict
s within our dict
, or use something else entirely.
May I suggest making a list of meaningful objects which each represent one of these people? Either create your own class
, or use a namedtuple
:
from collections import namedtuple
Person = namedtuple('Person','name place holdings')
[Person(name, place, dict(rest)) for (name,place), rest in accumulator.iteritems()]
Out[17]:
[Person(name='jane', place='nyc', holdings={'platinum': 5}),
Person(name='john', place='chicago', holdings={'brass': 60, 'silver': 40}),
Person(name='john', place='nyc', holdings={'silver': 20, 'gold': 30})]
Upvotes: 7
Reputation: 13356
May be a crazy idea, but how about a dict-of-dicts-of-dicts? This would work like a 2D array, the row and column indices being the names and places.
my_dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
all_names = set(d["name"] for d in my_dicts)
all_places = set(d["place"] for d in my_dicts)
merged = {name : {place : {} for place in all_places} for name in all_names}
for d in my_dicts:
merged[d["name"]][d["place"]][d["owns"]] = d["quantity"]
import pprint
pprint.pprint(merged)
# {'jane': {'chicago': {}, 'nyc': {'platinum': 5}},
# 'john': {'chicago': {'brass': 60, 'silver': 40},
# 'nyc': {'gold': 30, 'silver': 20}}}
Then convert to your desired format:
new_dicts = [{"name" : name, "place" : place} for name in all_names for place in all_places if merged[name][place]]
for d in new_dicts:
d.update(merged[d["name"]][d["place"]])
pprint.pprint(new_dicts)
# [{'name': 'jane', 'place': 'nyc', 'platinum': 5},
# {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
# {'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40}]
Upvotes: 0
Reputation: 239443
from itertools import groupby
result, get_owns = [], lambda x: x["owns"]
get_details = lambda x: (x["name"], x["place"])
# Sort and group the data based on name and place
for key, grp in groupby(sorted(data, key=get_details), key=get_details):
# Create a dictionary with the name and place
temp = dict(zip(("name", "place"), key))
# Sort and group the grouped data based on owns
for owns, grp1 in groupby(sorted(grp, key=get_owns), key=get_owns):
# For each material, find and add the sum of quantity in temp
temp[owns] = sum(item["quantity"] for item in grp1)
# Add the temp dictionary to the result :-)
result.append(temp)
print result
Output
[{'name': 'jane', 'place': 'nyc', 'platinum': 5},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}]
Upvotes: 0
Reputation: 12092
This is one way to do it:
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
We create a transformed dict with place-name
as key and output dict as the value
transformed_dict = {}
for a_dict in dicts:
key = '{}-{}'.format(a_dict['place'], a_dict['name'])
if key not in transformed_dict:
transformed_dict[key] = {'name': a_dict['name'], 'place': a_dict['place'], a_dict['owns']: a_dict['quantity']}
else:
transformed_dict[key][a_dict['owns']] = a_dict['quantity']
transformed_dict
now looks like:
{'chicago-john': {'brass': 60,
'name': 'john',
'place': 'chicago',
'silver': 40},
'nyc-jane': {'name': 'jane', 'place': 'nyc', 'platinum': 5},
'nyc-john': {'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20}}
pprint(list(transformed_dict.values()))
gives what we want:
[{'gold': 30, 'name': 'john', 'place': 'nyc', 'silver': 20},
{'brass': 60, 'name': 'john', 'place': 'chicago', 'silver': 40},
{'name': 'jane', 'place': 'nyc', 'platinum': 5}]
Upvotes: 0
Reputation: 6752
So my personal strategy for this is roughly outlined below. You should define a key generator given an instance of a dict, and then group it in an isolated dict by that key generated. Once you've iterated through all elements and updated based on the key, then simply return the .values()
of the grouped dict.
dicts = [
{"name": "john", "place": "nyc", "owns": "gold", "quantity": 30},
{"name": "john", "place": "nyc", "owns": "silver", "quantity": 20},
{"name": "jane", "place": "nyc", "owns": "platinum", "quantity": 5},
{"name": "john", "place": "chicago", "owns": "brass", "quantity": 60},
{"name": "john", "place": "chicago", "owns": "silver", "quantity": 40}
]
def get_key(instance):
return "%s-%s" % (instance.get("name"), instance.get("place"), )
grouped = {}
for dict_ in dicts:
grouped[get_key(dict_)] = grouped.get(get_key(dict_), {})
grouped[get_key(dict_)].update(dict_)
print grouped.values()
# [
# {'owns': 'platinum', 'place': 'nyc', 'name': 'jane', 'quantity': 5},
# {'name': 'john', 'place': 'nyc', 'owns': 'silver', 'quantity': 20},
# {'name': 'john', 'place': 'chicago', 'owns': 'silver', 'quantity': 40}
# ]
Upvotes: 1