Reputation: 109
I have a list of dictionaries where each dictionary has keys 'shape' and 'colour'. For example:
info = [
{'shape': 'pentagon', 'colour': 'red'},
{'shape': 'rectangle', 'colour': 'white'},
# etc etc
]
I need to find the most commonly occurring shape/colour combination. I decided to do this by finding the most commonly occurring dictionary in the list. I've cut my method down to this:
frequency = defaultdict(int)
for i in info:
hashed = json.dumps(i) # Get dictionary, turn into string to store as key in frequency dict
frequency[hashed] += 1
most_common = max(frequency, key = frequency.get) # Get most common key and unhash it back into dict
print(json.loads(most_common))
I'm kind of new to python and I always end up finding out about some 1-2 line function that ends up doing what I wanted to do in the first place. I was wondering if a quicker method existed in this case? Maybe this can end up helping another beginner out because I couldn't find anything after ages of googling.
Upvotes: 4
Views: 2036
Reputation: 2637
If the items in a list have consistent keys a better option is to use a namedtuple
in place of a dict
eg:
from collections import namedtuple
# Define the named tuple
MyItem = namedtuple("MyItem", "shape colour")
# Create your list of data
info = [
MyItem('pentagon', 'red'),
MyItem('rectangle', 'white'),
# etc etc
]
This provides a number of benefits:
# To instantiate
item = MyItem("pentagon", "red")
# or using keyword arguments
item = MyItem(shape="pentagon", colour="red")
# or from your existing dict
item = MyItem(**{'shape': 'pentagon', 'colour': 'red'})
# Accessors
print(item.shape)
print(item.colour)
# Decomposition
shape, colour = item
However, back to the question of counting matching items, because a namedtuple
is hashable collections.Counter
can be used and the counting code then becomes:
from collections import Counter
frequency = Counter(info)
# Get the items in the order of most common
frequency.most_common()
Enjoy!
Upvotes: 5
Reputation: 1540
Using pandas would make your problem much simpler.
import pandas as pd
info = [
{'shape': 'pentagon', 'colour': 'red'},
{'shape': 'rectangle', 'colour': 'white'},
# etc etc
]
df = pd.DataFrame(info)
# to get the most commonly occurring shape
# to get the count of values
print (df['shape'].value_counts())
# to get the most commonly occurring value
print (df['shape'].value_counts().argmax())
#or
print (df['shape'].value_counts().idxmax())
In order to get the most commonly occurring color, just change the shape to color
eg.
print (df['shape'].value_counts())
to print (df['colour'].value_counts())
Not just this, pandas gives you a lot of other cool in-built function to play around. To know more just google search pandas and you will have it.
NOTE: Please install pandas before using it.
pip install pandas
or
pip3 install pandas
Upvotes: 2
Reputation: 61526
Instead of converting the dict into a particular string representation, grab the data you need from each. Making a tuple of the two string values gives you something hashable that works as a dict key.
The Python standard library provides collections.Counter
for this exact counting purpose.
Thus:
from collections import Counter
info = # ...
histogram = Counter((item['shape'], item['colour']) for item in info)
# the most_common method gives a list of the n most common items.
shape, colour = histogram.most_common(1)[0]
# re-assemble the dict, if desired, and print it.
print({'shape': shape, 'colour': colour})
Upvotes: 2