CSStudent7782
CSStudent7782

Reputation: 658

What's the most efficient data structure to find the max of one value of a pair and needs to be constantly re-built?

I'm trying to find the fastest solution to this simple problem. Right now I'm using a dictionary.

I have a bunch of string - int pairs that are constantly being thrown out for new ones. And I need to find the strings that match the n-highest ints (usually just 1, 2 or 3).

So the building of my data structure must be efficient, but also finding the max int and it's paired string is important.

Is a dictionary close to the best data structure? If so, should my ints be the keys or the values?

The language is python3 if it matters.

Upvotes: 1

Views: 1589

Answers (2)

jpp
jpp

Reputation: 164823

And I need to find the strings that match the n-highest ints (usually just 1, 2 or 3).

You can use heapq with a dictionary. Below is an example extracting the strings associated with the 3 highest integers. Duplicate numbers are only included until the heap is full.

import heapq
from operator import itemgetter

L = [(1, 'test'), (1000, 'big number'), (500, 'middle'), (5000, 'even bigger'),
     (4000, 'big, but not biggest'), (5000, 'another even bigger')]

d = {v: k for k, v in L}

heap = heapq.nlargest(3, d.items(), key=itemgetter(1))

res = list(map(itemgetter(0), heap))

print(res)

['even bigger', 'another even bigger', 'big, but not biggest']

As described in this answer, the solution will have O(n log n) time complexity.

Upvotes: 1

Alexander
Alexander

Reputation: 109726

Try a SortedList from the sortedcontainers module ("written in pure-Python, and fast as C-extensions").

At the core of Sorted Containers is the mutable sequence data type SortedList. The SortedList maintains its values in ascending sort order. As with Python’s built-in list data type, SortedList supports duplicate elements and fast random-access indexing.

Values may be added to a SortedList using either SortedList.update() or SortedList.add(). When doing so, the list remains sorted.

Because SortedList is sorted, it supports efficient lookups by value or by index.

Install the module if you don't have it:

$ pip install sortedcontainers

Store your values and strings as tuple pairs:

from sortedcontainers import SortedList

sorted_list = SortedList()

# Sample data.
sorted_list.update([
    (1, 'test'), 
    (1000, 'big number'), 
    (500, 'middle')])

>>> sorted_list[-1]
(1000, 'big number')

sorted_list.add((5000, 'even bigger'))
sorted_list.add((4000, 'big, but not biggest'))

# Get last two largest values.
>>> sorted_list[-2:]
[(4000, 'big, but not biggest'), (5000, 'even bigger')]

Upvotes: 1

Related Questions