Bach
Bach

Reputation: 6217

Extracting latest datetime from a list "every second"

Suppose I have a sorted list of datetime objects, times. I wish to extract from that list another list, which may be either longer or shorter, but that will include at least some (if not all) of the elements in the original list, but some may repeat, in the following manner:

First a virtual "start time" will be set, which is the latest rounded-to-the-second time which is no later than any of the datetimes in the original list. For example, if the original list is

dts = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
       datetime.datetime(2014, 8, 15, 13, 1, 42, 158749),
       datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
       datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
       datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]

then the "start time" will be

start_time = datetime.datetime(2014, 8, 15, 13, 1, 41)

Then, this "start_time" progresses one second per iteration, and each such iteration yields the last datetime from the original list which is not later than the current "time". It halts when all original datetimes are earlier than the current time. For example, given the above original list, the output should be

output = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
          datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
          datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
          datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
          datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]

Can you think of a simple (and efficient) way to achieve this?

Upvotes: 0

Views: 47

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121386

Use the bisect module to locate each datetime object, in a generator function. Limit the starting parameter to the last location you found. Each loop then takes at most log N steps to find the next value, where N is the remainder of the elements that are still candidates.

Producing the start value is as simple as calling .replace(microsecond=0) on the first value; the end value is just the last value in your sorted input; as long as your 'current' position is lower than the end, add one second and find its insertion point; the value before it is the one to yield.

import bisect
import datetime

def generate_last(dts):
    current = dts[0].replace(microsecond=0)
    end = dts[-1]
    pos = 0
    while current < end:
        current += datetime.timedelta(seconds=1)
        pos = bisect.bisect(dts, current, lo=pos)
        yield dts[pos - 1]

Demo:

>>> import bisect
>>> import datetime
>>> def generate_last(dts):
...     current = dts[0].replace(microsecond=0)
...     end = dts[-1]
...     pos = 0
...     while current < end:
...         current += datetime.timedelta(seconds=1)
...         pos = bisect.bisect(dts, current, lo=pos)
...         yield dts[pos - 1]
... 
>>> dts = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
...        datetime.datetime(2014, 8, 15, 13, 1, 42, 158749),
...        datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
...        datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
...        datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]
>>> pprint(list(generate_last(dts)))
[datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
 datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
 datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
 datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
 datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]

Upvotes: 3

Related Questions