kevingduck
kevingduck

Reputation: 531

New dict of top n values (and keys) from dictionary (Python)

I have a dictionary of names and the number of times the names appear in the phone book:

names_dict = {
    'Adam': 100,
    'Anne': 400,
    'Britney': 321,
    'George': 645,
    'Joe': 200,
    'John': 1010,
    'Mike': 500,
    'Paul': 325,
    'Sarah': 150
}

Preferably without using sorted(), I want to iterate through the dictionary and create a new dictionary that has the top five names only:

def sort_top_list():
  # create dict of any 5 names first
  new_dict = {}
  for i in names_dict.keys()[:5]:
    new_dict[i] = names_dict[i]:

  # Find smallest current value in new_dict
  # and compare to others in names_dict
  # to find bigger ones; replace smaller name in new_dict with bigger name
  for k,v in address_dict.iteritems():
    current_smallest = min(new_dict.itervalues())
    if v > current_smallest:
      # Found a bigger value; replace smaller key/ value in new_dict with larger key/ value
      new_dict[k] = v
      # ?? delete old key/ value pair from new_dict somehow

I seem to be able to create a new dictionary that gets a new key/ value pair whenever we iterate through names_dict and find a name/ count that is higher than what we have in new_dict. I can't figure out, though, how to remove the smaller ones from new_dict after we add the bigger ones from names_dict.

Is there a better way - without having to import special libraries or use sorted() - to iterate through a dict and create a new dict of the top N keys with the highest values?

Upvotes: 3

Views: 1757

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1123500

You should use the heapq.nlargest() function to achieve this:

import heapq
from operator import itemgetter

top_names = dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))

This uses a more efficient algorithm (O(NlogK) for a dict of size N, and K top items) to extract the top 5 items as (key, value) tuples, which are then passed to dict() to create a new dictionary.

Demo:

>>> import heapq
>>> from operator import itemgetter
>>> names_dict = {'Adam': 100, 'Anne': 400, 'Britney': 321, 'George': 645, 'Joe': 200, 'John': 1010, 'Mike': 500, 'Paul': 325, 'Sarah': 150}
>>> dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))
{'John': 1010, 'George': 645, 'Mike': 500, 'Anne': 400, 'Paul': 325}

You probably want to use the collections.Counter() class instead. The Counter.most_common() method would have made your use-case trivial to solve. The implementation for that method uses heapq.nlargest() under the hood.

These are not special libraries, they are part of the Python standard library. You otherwise would have to implement a binary heap yourself to achieve this. Unless you are specifically studying this algorithm, there is little point in re-implementing your own, the Python implementation is highly optimised with an extension written in C for some critical functions).

Upvotes: 6

am2
am2

Reputation: 371

I do not know, why you don't want to use sort and the solution is not perfect and even doesn't match your problem exactly, but I hope it can inspire you to find your own implementation. I think it was only a short example for the real Problem you have.

But as you have seen on the other answer: Normally it is better to use code, that is written before instead of do all the things yourself.

names_dict = {'Joe' : 200, 'Anne': 400, 'Mike': 500, 'John': 1010, 'Sarah': 150, 'Paul': 325, 'George' : 645, 'Adam' : 100, 'Britney': 321}

def extract_top_n(dictionary, count):
    #first step: Find the topmost values
    highest_values = []
    for k,v in dictionary.iteritems():
        print k,v, highest_values, len(highest_values)
        highest_values.append(v)
        l = len(highest_values)
        for i in range(l-1):
            print i,l
            if l-i < 1:
                break
            if highest_values[l-i-1]>highest_values[l-i-2]:
                temp = highest_values[l-i-2]
                highest_values[l-i-2] = highest_values[l-i-1]
                highest_values[l-i-1] = temp
        highest_values = highest_values [:count]

    #fill the dirctionary with all entries at least as big as the smallest of the biggest
    #but pay attention: If there are more than 2 occurances of one of the top N there will be more than N entries in the dictionary
    last_interesting = highest_values[len(highest_values)-1]
    return_dictionary = {}    
    for k,v in dictionary.iteritems():
        if v >= last_interesting:
            return_dictionary[k] = v
    return return_dictionary

print extract_top_n(names_dict,3)        

Upvotes: 0

Related Questions