Reputation: 658
I'm trying to find the fastest solution to this simple problem. Right now I'm using a dictionary.
I have a bunch of string - int pairs that are constantly being thrown out for new ones. And I need to find the strings that match the n-highest ints (usually just 1, 2 or 3).
So the building of my data structure must be efficient, but also finding the max int and it's paired string is important.
Is a dictionary close to the best data structure? If so, should my ints be the keys or the values?
The language is python3 if it matters.
Upvotes: 1
Views: 1589
Reputation: 164823
And I need to find the strings that match the n-highest ints (usually just 1, 2 or 3).
You can use heapq
with a dictionary. Below is an example extracting the strings associated with the 3 highest integers. Duplicate numbers are only included until the heap is full.
import heapq
from operator import itemgetter
L = [(1, 'test'), (1000, 'big number'), (500, 'middle'), (5000, 'even bigger'),
(4000, 'big, but not biggest'), (5000, 'another even bigger')]
d = {v: k for k, v in L}
heap = heapq.nlargest(3, d.items(), key=itemgetter(1))
res = list(map(itemgetter(0), heap))
print(res)
['even bigger', 'another even bigger', 'big, but not biggest']
As described in this answer, the solution will have O(n log n) time complexity.
Upvotes: 1
Reputation: 109726
Try a SortedList
from the sortedcontainers
module ("written in pure-Python, and fast as C-extensions").
At the core of Sorted Containers is the mutable sequence data type SortedList. The SortedList maintains its values in ascending sort order. As with Python’s built-in list data type, SortedList supports duplicate elements and fast random-access indexing.
Values may be added to a SortedList using either SortedList.update() or SortedList.add(). When doing so, the list remains sorted.
Because SortedList is sorted, it supports efficient lookups by value or by index.
Install the module if you don't have it:
$ pip install sortedcontainers
Store your values and strings as tuple pairs:
from sortedcontainers import SortedList
sorted_list = SortedList()
# Sample data.
sorted_list.update([
(1, 'test'),
(1000, 'big number'),
(500, 'middle')])
>>> sorted_list[-1]
(1000, 'big number')
sorted_list.add((5000, 'even bigger'))
sorted_list.add((4000, 'big, but not biggest'))
# Get last two largest values.
>>> sorted_list[-2:]
[(4000, 'big, but not biggest'), (5000, 'even bigger')]
Upvotes: 1