Jamesyyy
Jamesyyy

Reputation: 109

Best way to find most commonly occurring dictionary inside a list of dictionaries

I have a list of dictionaries where each dictionary has keys 'shape' and 'colour'. For example:

info = [
    {'shape': 'pentagon', 'colour': 'red'},
    {'shape': 'rectangle', 'colour': 'white'},
    # etc etc
]

I need to find the most commonly occurring shape/colour combination. I decided to do this by finding the most commonly occurring dictionary in the list. I've cut my method down to this:

frequency = defaultdict(int)

for i in info:
    hashed = json.dumps(i) # Get dictionary, turn into string to store as key in frequency dict
    frequency[hashed] += 1

most_common = max(frequency, key = frequency.get) # Get most common key and unhash it back into dict
print(json.loads(most_common))

I'm kind of new to python and I always end up finding out about some 1-2 line function that ends up doing what I wanted to do in the first place. I was wondering if a quicker method existed in this case? Maybe this can end up helping another beginner out because I couldn't find anything after ages of googling.

Upvotes: 4

Views: 2036

Answers (3)

Tim
Tim

Reputation: 2637

If the items in a list have consistent keys a better option is to use a namedtuple in place of a dict eg:

from collections import namedtuple

# Define the named tuple
MyItem = namedtuple("MyItem", "shape colour")

# Create your list of data
info = [
    MyItem('pentagon', 'red'),
    MyItem('rectangle', 'white'),
    # etc etc
]

This provides a number of benefits:

# To instantiate
item = MyItem("pentagon", "red")

# or using keyword arguments
item = MyItem(shape="pentagon", colour="red")

# or from your existing dict
item = MyItem(**{'shape': 'pentagon', 'colour': 'red'})

# Accessors
print(item.shape)
print(item.colour)

# Decomposition
shape, colour = item

However, back to the question of counting matching items, because a namedtuple is hashable collections.Counter can be used and the counting code then becomes:

from collections import Counter

frequency = Counter(info)

# Get the items in the order of most common
frequency.most_common()

Enjoy!

Upvotes: 5

Joish
Joish

Reputation: 1540

Using pandas would make your problem much simpler.

import pandas as pd

info = [
    {'shape': 'pentagon', 'colour': 'red'},
    {'shape': 'rectangle', 'colour': 'white'},
    # etc etc
]

df = pd.DataFrame(info)

# to get the most commonly occurring shape
# to get the count of values
print (df['shape'].value_counts())

# to get the most commonly occurring value
print (df['shape'].value_counts().argmax())

#or
print (df['shape'].value_counts().idxmax())

In order to get the most commonly occurring color, just change the shape to color eg. print (df['shape'].value_counts()) to print (df['colour'].value_counts())

Not just this, pandas gives you a lot of other cool in-built function to play around. To know more just google search pandas and you will have it.

NOTE: Please install pandas before using it.

pip install pandas 

or

pip3 install pandas

Upvotes: 2

Karl Knechtel
Karl Knechtel

Reputation: 61526

  1. Instead of converting the dict into a particular string representation, grab the data you need from each. Making a tuple of the two string values gives you something hashable that works as a dict key.

  2. The Python standard library provides collections.Counter for this exact counting purpose.

Thus:

from collections import Counter
info = # ...
histogram = Counter((item['shape'], item['colour']) for item in info)
# the most_common method gives a list of the n most common items.
shape, colour = histogram.most_common(1)[0]
# re-assemble the dict, if desired, and print it.
print({'shape': shape, 'colour': colour})

Upvotes: 2

Related Questions