Reputation: 12581
I have an OrderedDictionary that contains rate values. Each entry has a date for a key (each date happening to be the start of a yearly quarter), and the value is a number. Dates are inserted in order, from oldest to newest.
{
date(2017, 1, 1): 95,
date(2018, 1, 1): 100,
date(2018, 6, 1): 110,
date(2018, 9, 1): 112,
}
My dictionary of rates is much larger than this, but this is the general idea. Given an arbitrary date, I want to find the value in the dictionary that precedes it. For example, looking up a date of date(2018, 8, 1)
should return the value 110, since the entry date(2018, 6, 1)
is the nearest key that precedes my date lookup. Similarly, a date of date(2017, 12, 1)
should return 95, as the nearest preceding key happens to be date(2017, 1, 1)
.
I could easily do this by walking the items in the dictionary:
def find_nearest(lookup):
nearest = None
for d, value in rates.items():
if(d > lookup):
break
nearest = value
return nearest
This feels inefficient to me, however, since in the worst case I have to scan the entire dictionary (which I mentioned before could be large). I will be doing tens of thousands of these kinds of lookups, so I want it to be performant.
Another option to solve the performance issue would be to create a cache of what I've seen, which is also doable, though I wonder about the memory constraints (I'm not entirely sure how large the cache would grow).
Are there any clever methods or Python core modules I can use here?
Upvotes: 9
Views: 714
Reputation: 164773
Since OrderedDict
is implemented via linked lists, you cannot directly retrieve values by position in less than O(n) time, although you can take advantage of keys being sorted to reduce to O(log n). See also: Accessing dictionary items by position in Python 3.6+ efficiently.
For efficiency, I suggest you use a 3rd party library such as Pandas, which uses NumPy arrays held in contiguous memory blocks. The time complexity is O(n), but you should see improved performance for larger input dictionaries.
import pandas as pd
from datetime import date
d = {date(2017, 1, 1): 95, date(2018, 1, 1): 100,
date(2018, 6, 1): 110, date(2018, 9, 1): 112}
df = pd.DataFrame.from_dict(d, orient='index')
df.index = pd.to_datetime(df.index)
my_date = pd.to_datetime(date(2018, 8, 1))
res = df[0].iat[df.index.get_loc(my_date, method='ffill')] # 110
An alternative, more verbose method:
diffs = (my_date - df.index) > pd.Timedelta(0)
res = df[0].iat[-(diffs[::-1].argmax() + 1)] # 110
Upvotes: 0
Reputation: 106901
Since you're inserting dates into the dict in order and you are presumably using Python 3.7 (which makes dict order significant), you can use a recursive function that divides and conquers to find the desired index of the key list in O(log n) time complexity:
def find_nearest(l, lookup):
if len(l) == 1:
return l[0]
mid = len(l) // 2
if l[mid] > lookup:
return find_nearest(l[:mid], lookup)
return find_nearest(l[mid:], lookup)
so that:
from datetime import date
d = {
date(2017, 1, 1): 95,
date(2018, 1, 1): 100,
date(2018, 6, 1): 110,
date(2018, 9, 1): 112,
}
d[find_nearest(list(d), date(2018, 8, 1))]
returns: 110
Upvotes: 2
Reputation: 4348
Edit I just realized you wanted a core module – my answer uses pandas!
If you have unique date values, you can use pandas to create a dataframe which uses the dates as indices:
df = pd.DataFrame.from_dict(rates, orient='index', columns=['value'])
# Convert index to pandas datetime
df.index = pd.to_datetime(df.index)
This returns:
value
2017-01-01 95
2018-01-01 100
2018-06-01 110
2018-09-01 112
Then:
def lookup(date, df):
# Convert input to datetime
date = pd.to_datetime(date)
# Get closest date in index
closest_date = min(df.index, key=lambda x: abs(x - date))
# Find corresponding index of closest date
index = np.where(df.index == closest_date)[0][0]
# If the date found if greater than the input, then get the date at the index before
if closest_date > date:
index -= 1
return df.iloc[index].value
>> lookup('2018-06-02', df)
Out: 110
>> lookup('2018-05-01', df)
Out: 100
Upvotes: 0
Reputation: 387
sortedcontainers may be what you want.
It will keep the key in sorted order rather than insert order, which is different from collections.OrderedDict
.
Install
$ pip install sortedcontainers
To achieve what you want
from sortedcontainers import SortedDict
def find_nearest(sorted_dict, lookup):
key = sorted_dict.iloc[sorted_dict.bisect_left(lookup) - 1]
return sorted_dict[key]
sd = SortedDict({0: '0', 4: '4', 8: '8', 12: '12'})
print(find_nearest(sd, 4)) # 0
print(find_nearest(sd, 3)) # 0
print(find_nearest(sd, 12)) # 8
The time complexity of this method is O(log n)
Upvotes: 2
Reputation: 1247
you could try .get() method, which returns a value only if it exists, otherwise returns None
import datetime
from datetime import date
def findNearest(somedate, dictionary):
while dictionary.get(somedate) is None:
somedate=somedate-datetime.timedelta(1)
return dictionary.get(somedate)
result=findNearest(date(2017, 1, 3), yourDictionary)
when you print result it will print '95', the value for date(2017, 1, 1)
Upvotes: -1