Jonah Bishop
Jonah Bishop

Reputation: 12581

Efficiently finding the previous key in an OrderedDictionary

I have an OrderedDictionary that contains rate values. Each entry has a date for a key (each date happening to be the start of a yearly quarter), and the value is a number. Dates are inserted in order, from oldest to newest.

{
    date(2017, 1, 1): 95,
    date(2018, 1, 1): 100,
    date(2018, 6, 1): 110,
    date(2018, 9, 1): 112,
}

My dictionary of rates is much larger than this, but this is the general idea. Given an arbitrary date, I want to find the value in the dictionary that precedes it. For example, looking up a date of date(2018, 8, 1) should return the value 110, since the entry date(2018, 6, 1) is the nearest key that precedes my date lookup. Similarly, a date of date(2017, 12, 1) should return 95, as the nearest preceding key happens to be date(2017, 1, 1).

I could easily do this by walking the items in the dictionary:

def find_nearest(lookup):
    nearest = None
    for d, value in rates.items():
        if(d > lookup):
            break
        nearest = value
    return nearest

This feels inefficient to me, however, since in the worst case I have to scan the entire dictionary (which I mentioned before could be large). I will be doing tens of thousands of these kinds of lookups, so I want it to be performant.

Another option to solve the performance issue would be to create a cache of what I've seen, which is also doable, though I wonder about the memory constraints (I'm not entirely sure how large the cache would grow).

Are there any clever methods or Python core modules I can use here?

Upvotes: 9

Views: 714

Answers (5)

jpp
jpp

Reputation: 164773

Since OrderedDict is implemented via linked lists, you cannot directly retrieve values by position in less than O(n) time, although you can take advantage of keys being sorted to reduce to O(log n). See also: Accessing dictionary items by position in Python 3.6+ efficiently.

For efficiency, I suggest you use a 3rd party library such as Pandas, which uses NumPy arrays held in contiguous memory blocks. The time complexity is O(n), but you should see improved performance for larger input dictionaries.

import pandas as pd
from datetime import date

d = {date(2017, 1, 1): 95, date(2018, 1, 1): 100,
     date(2018, 6, 1): 110, date(2018, 9, 1): 112}

df = pd.DataFrame.from_dict(d, orient='index')
df.index = pd.to_datetime(df.index)

my_date = pd.to_datetime(date(2018, 8, 1))
res = df[0].iat[df.index.get_loc(my_date, method='ffill')]  # 110

An alternative, more verbose method:

diffs = (my_date - df.index) > pd.Timedelta(0)
res = df[0].iat[-(diffs[::-1].argmax() + 1)]                # 110

Upvotes: 0

blhsing
blhsing

Reputation: 106901

Since you're inserting dates into the dict in order and you are presumably using Python 3.7 (which makes dict order significant), you can use a recursive function that divides and conquers to find the desired index of the key list in O(log n) time complexity:

def find_nearest(l, lookup):
    if len(l) == 1:
        return l[0]
    mid = len(l) // 2
    if l[mid] > lookup:
        return find_nearest(l[:mid], lookup)
    return find_nearest(l[mid:], lookup)

so that:

from datetime import date
d = {
    date(2017, 1, 1): 95,
    date(2018, 1, 1): 100,
    date(2018, 6, 1): 110,
    date(2018, 9, 1): 112,
}
d[find_nearest(list(d), date(2018, 8, 1))]

returns: 110

Upvotes: 2

Michele Tonutti
Michele Tonutti

Reputation: 4348

Edit I just realized you wanted a core module – my answer uses pandas!

If you have unique date values, you can use pandas to create a dataframe which uses the dates as indices:

df = pd.DataFrame.from_dict(rates, orient='index', columns=['value'])
# Convert index to pandas datetime
df.index = pd.to_datetime(df.index)

This returns:

              value
2017-01-01     95
2018-01-01    100
2018-06-01    110
2018-09-01    112

Then:

def lookup(date, df):
    # Convert input to datetime
    date = pd.to_datetime(date)
    # Get closest date in index
    closest_date = min(df.index, key=lambda x: abs(x - date))
    # Find corresponding index of closest date
    index = np.where(df.index == closest_date)[0][0]
    # If the date found if greater than the input, then get the date at the index before
    if closest_date > date:
        index -= 1

    return df.iloc[index].value

>> lookup('2018-06-02', df)
Out: 110

>> lookup('2018-05-01', df)
Out: 100

Upvotes: 0

zxch3n
zxch3n

Reputation: 387

sortedcontainers may be what you want.

It will keep the key in sorted order rather than insert order, which is different from collections.OrderedDict.

Install

$ pip install sortedcontainers

To achieve what you want

from sortedcontainers import SortedDict
def find_nearest(sorted_dict, lookup):
    key = sorted_dict.iloc[sorted_dict.bisect_left(lookup) - 1]
    return sorted_dict[key]

sd = SortedDict({0: '0', 4: '4', 8: '8', 12: '12'})
print(find_nearest(sd, 4))  # 0
print(find_nearest(sd, 3))  # 0
print(find_nearest(sd, 12))  # 8 

The time complexity of this method is O(log n)

Upvotes: 2

Petru Tanas
Petru Tanas

Reputation: 1247

you could try .get() method, which returns a value only if it exists, otherwise returns None

import datetime
from datetime import date

def findNearest(somedate, dictionary):
    while dictionary.get(somedate) is None:
        somedate=somedate-datetime.timedelta(1)

    return dictionary.get(somedate)


result=findNearest(date(2017, 1, 3), yourDictionary)

when you print result it will print '95', the value for date(2017, 1, 1)

Upvotes: -1

Related Questions