Reputation: 6159
I have time series data that I am currently storing in a dictionary where the dictionary 'keys' are datetime.datetime
objects. Something along the lines of:
data[datetime.datetime(2012,5,14,15,28,2)]={'error':error,'flags':flags,'value':value}
The question I have is: What is the best way to find the closest two times (before and after) a specified time? I need this function to be as fast a possible because it is called (~10,000) inside a loop that is linearly interpolating between the two closest points.
I currently have one method working which takes a ridiculously long time because it searches through all the keys (~50,000):
def findTime(time):
keys=data.keys()
bdt=10000000000000000000
adt=10000000000000000000
minKey=False
maxKey=False
for key in keys:
dt=(time-key).total_seconds()
if abs(dt)<bdt and dt>0:
bdt=abs(dt)
minKey=key
elif abs(dt)<adt and dt<0:
adt=abs(dt)
maxKey=key
return minKey,maxKey
My attempt at using bisect:
def findTime(time):
keys=data.keys()
l,r = bisect.bisect_left(time,keys), bisect.bisect_right(time,keys)
return l,r
Unfortunately, this produces an error:
TypeError: 'datetime.datetime' object does not support indexing
Any help would be appreciated.
Upvotes: 4
Views: 10944
Reputation: 103874
You are far better off using a different key for your dict.
Two are obvious.
1) You can use ISO 8601 date format as a string. This is essentially YYYY-MM-DD
format. You can also use YYYY-MM-DD:HH:MM:SS
format. A property of ISO 8601 is is lexical sorting, so in a sorted list of keys just take the two sorted keys above and below the insertion point.
2) You can use a float representation of the dates with the integer part being a day offset from a millennium mark and the float being the fraction of the day which is then easily converted to HH:MM:SS. Excel and Windows and Unix use this approach.
Example of 1):
>>> datetime.datetime.fromtimestamp(time.time()).isoformat()
'2012-05-14T13:55:22.142548' # a hashable, sortable dict key based on time
Example of 2):
>>> time.time() # That is days and fraction of day since 1/1/1970
1337028447.499273 # THAT is you dict key
>>> datetime.datetime.fromtimestamp(time.time()).timetuple()
time.struct_time(tm_year=2012, tm_mon=5, tm_mday=14, tm_hour=13, tm_min=52, tm_sec=13, tm_wday=0, tm_yday=135, tm_isdst=-1)
In either case, Python would be able to manage a data structure of 50,000 elements in milliseconds.
Convert the time stamp to a datetime object as needed.
Upvotes: 3
Reputation: 32095
Create an index based on bisect
module seems to be a valuable idea to dig into. However, by looking at the documentation, you will see that bisect functions take a sorted list as a first argument and not in second argument.
Try:
keys=sorted(data.keys())
bisect.bisect_left(keys,time), bisect.bisect_right(keys,time)
Also, you can try to optimize your code by constructing the keys
object outside of your findTime
function. If you data
dictionary is not modified through your sequence of findTime
calls, you will pay the construction of the sorted list only once.
Upvotes: 1
Reputation: 488453
The bisect
functions take as their first argument a sorted array (or list, or really, anything that can be indexed). keys
is an unsorted array, and you're passing it as the second argument.
This should work:
def findTime(time):
keys = sorted(data.keys())
return bisect.bisect_left(keys, time), bisect.bisect_right(keys, time)
although you should keep the sorted copy around for repeated searches that have not altered the data, rather than re-sorting every time.
Upvotes: 4