Onlyjus
Onlyjus

Reputation: 6159

Dictionary searching with datetime keys

I have time series data that I am currently storing in a dictionary where the dictionary 'keys' are datetime.datetime objects. Something along the lines of:

data[datetime.datetime(2012,5,14,15,28,2)]={'error':error,'flags':flags,'value':value}

The question I have is: What is the best way to find the closest two times (before and after) a specified time? I need this function to be as fast a possible because it is called (~10,000) inside a loop that is linearly interpolating between the two closest points.


I currently have one method working which takes a ridiculously long time because it searches through all the keys (~50,000):

def findTime(time):
    keys=data.keys()
    bdt=10000000000000000000
    adt=10000000000000000000
    minKey=False
    maxKey=False
    for key in keys:
        dt=(time-key).total_seconds()
        if abs(dt)<bdt and dt>0:
            bdt=abs(dt)
            minKey=key
        elif abs(dt)<adt and dt<0:
            adt=abs(dt)
            maxKey=key
    return minKey,maxKey

My attempt at using bisect:

def findTime(time):
    keys=data.keys()
    l,r = bisect.bisect_left(time,keys), bisect.bisect_right(time,keys)
    return l,r

Unfortunately, this produces an error:

TypeError: 'datetime.datetime' object does not support indexing

Any help would be appreciated.

Upvotes: 4

Views: 10944

Answers (3)

dawg
dawg

Reputation: 103874

You are far better off using a different key for your dict.

Two are obvious.

1) You can use ISO 8601 date format as a string. This is essentially YYYY-MM-DD format. You can also use YYYY-MM-DD:HH:MM:SS format. A property of ISO 8601 is is lexical sorting, so in a sorted list of keys just take the two sorted keys above and below the insertion point.

2) You can use a float representation of the dates with the integer part being a day offset from a millennium mark and the float being the fraction of the day which is then easily converted to HH:MM:SS. Excel and Windows and Unix use this approach.

Example of 1):

>>> datetime.datetime.fromtimestamp(time.time()).isoformat()
'2012-05-14T13:55:22.142548'  # a hashable, sortable dict key based on time

Example of 2):

>>> time.time()               # That is days and fraction of day since 1/1/1970 
1337028447.499273             # THAT is you dict key
>>> datetime.datetime.fromtimestamp(time.time()).timetuple()
time.struct_time(tm_year=2012, tm_mon=5, tm_mday=14, tm_hour=13, tm_min=52, tm_sec=13, tm_wday=0, tm_yday=135, tm_isdst=-1)

In either case, Python would be able to manage a data structure of 50,000 elements in milliseconds.

Convert the time stamp to a datetime object as needed.

Upvotes: 3

Zeugma
Zeugma

Reputation: 32095

Create an index based on bisect module seems to be a valuable idea to dig into. However, by looking at the documentation, you will see that bisect functions take a sorted list as a first argument and not in second argument.

Try:

keys=sorted(data.keys())
bisect.bisect_left(keys,time), bisect.bisect_right(keys,time)

Also, you can try to optimize your code by constructing the keys object outside of your findTime function. If you data dictionary is not modified through your sequence of findTime calls, you will pay the construction of the sorted list only once.

Upvotes: 1

torek
torek

Reputation: 488453

The bisect functions take as their first argument a sorted array (or list, or really, anything that can be indexed). keys is an unsorted array, and you're passing it as the second argument.

This should work:

def findTime(time):
    keys = sorted(data.keys())
    return bisect.bisect_left(keys, time), bisect.bisect_right(keys, time)

although you should keep the sorted copy around for repeated searches that have not altered the data, rather than re-sorting every time.

Upvotes: 4

Related Questions