Reputation: 6217
Suppose I have a sorted list of datetime objects, times
. I wish to extract from that list another list, which may be either longer or shorter, but that will include at least some (if not all) of the elements in the original list, but some may repeat, in the following manner:
First a virtual "start time" will be set, which is the latest rounded-to-the-second time which is no later than any of the datetimes in the original list. For example, if the original list is
dts = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 42, 158749),
datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]
then the "start time" will be
start_time = datetime.datetime(2014, 8, 15, 13, 1, 41)
Then, this "start_time" progresses one second per iteration, and each such iteration yields the last datetime from the original list which is not later than the current "time". It halts when all original datetimes are earlier than the current time. For example, given the above original list, the output should be
output = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]
Can you think of a simple (and efficient) way to achieve this?
Upvotes: 0
Views: 47
Reputation: 1121386
Use the bisect
module to locate each datetime
object, in a generator function. Limit the starting parameter to the last location you found. Each loop then takes at most log N steps to find the next value, where N is the remainder of the elements that are still candidates.
Producing the start value is as simple as calling .replace(microsecond=0)
on the first value; the end value is just the last value in your sorted input; as long as your 'current' position is lower than the end, add one second and find its insertion point; the value before it is the one to yield.
import bisect
import datetime
def generate_last(dts):
current = dts[0].replace(microsecond=0)
end = dts[-1]
pos = 0
while current < end:
current += datetime.timedelta(seconds=1)
pos = bisect.bisect(dts, current, lo=pos)
yield dts[pos - 1]
Demo:
>>> import bisect
>>> import datetime
>>> def generate_last(dts):
... current = dts[0].replace(microsecond=0)
... end = dts[-1]
... pos = 0
... while current < end:
... current += datetime.timedelta(seconds=1)
... pos = bisect.bisect(dts, current, lo=pos)
... yield dts[pos - 1]
...
>>> dts = [datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
... datetime.datetime(2014, 8, 15, 13, 1, 42, 158749),
... datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
... datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
... datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]
>>> pprint(list(generate_last(dts)))
[datetime.datetime(2014, 8, 15, 13, 1, 41, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 42, 258749),
datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 43, 658749),
datetime.datetime(2014, 8, 15, 13, 1, 45, 658749)]
Upvotes: 3