Enrico
Enrico

Reputation: 2037

Better way to extract duration list from datetime list

I have a list of datetimes. I want to convert this into a list where the durations are shown between the datetimes. The following code works fine, however if I look at it it seems overkill. First I convert the list to a numpy array, then I create the dureation array and convert it back into a list of seconds. I come across this many times, therefore it would be great if somebody tells me what the most efficient way would be to do this.

import datetime;
from numpy import *

times = [datetime.datetime(2014, 6, 23, 18, 56, 30),
 datetime.datetime(2014, 6, 23, 18, 57),
 datetime.datetime(2014, 6, 23, 18, 57, 30),
 datetime.datetime(2014, 6, 23, 18, 58),
 datetime.datetime(2014, 6, 23, 18, 58, 30),
 datetime.datetime(2014, 6, 23, 18, 59),
 datetime.datetime(2014, 6, 23, 18, 59, 30)]

seconds = array(times)
start = times[0]
duration = seconds - start

secs = [];
for item in duration:
    secs.append(item.seconds);

# result: secs = [0, 30, 60, 90, 120, 150, 180]

Upvotes: 0

Views: 538

Answers (4)

sebastian
sebastian

Reputation: 9696

numpy.diff should be working: http://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html

It should be faster once your lists of datetimes becomes large (not sure why you're using numpy for the above). You could probably gain even more performance if you switch to numpy datetime types.

>>> times = numpy.array(times)
>>> diffs =numpy.diff(times)
>>> diffs
array([datetime.timedelta(0, 30), datetime.timedelta(0, 30),
       datetime.timedelta(0, 30), datetime.timedelta(0, 30),
       datetime.timedelta(0, 30), datetime.timedelta(0, 30)], dtype=object)

If you want the raw numbers of seconds, you can get those via the timedelta.total_seconds() method:

seconds = [x.total_seconds() for x in diffs]

EDIT:

If all deltas are supposed to be with respect to the fist datetime value, than you can simply do:

seconds = [x.total_seconds() for x in times - times[0]]

No need for diff then...

Upvotes: 1

dawg
dawg

Reputation: 103884

With the line duration = seconds - start you create a list of time deltas in numpy:

>>> duration
[datetime.timedelta(0) datetime.timedelta(0, 30) datetime.timedelta(0, 60) datetime.timedelta(0, 90) datetime.timedelta(0, 120) datetime.timedelta(0, 150) datetime.timedelta(0, 180)]

So you can produce what you want directly with numpy.vectorize to produce a new array that separates out the total seconds from the duration array.

If you are just doing this once, you can use vectorize as a map-like throw-away function:

>>> vectorize(lambda td: td.total_seconds())(duration)
[   0.   30.   60.   90.  120.  150.  180.]

Or keep it to use multiple times:

>>> v=vectorize(lambda td: td.total_seconds())
>>> v(duration), v(duration*2)
[   0.   30.   60.   90.  120.  150.  180.] [   0.   60.  120.  180.  240.  300.  360.]

The advantage is that if you are working in numpy, this keeps the data in numpy -- no roundtrip to Python as a list comprehension would create.

Upvotes: 1

dano
dano

Reputation: 94891

You can do the subtraction on the datetime objects directly:

>>> [(a - times[0]).total_seconds() for a in times]
[0, 30, 60, 90, 120, 150, 180]

When you subtract two datetime.datetime objects, you get a datetime.timedelta object back, which represents the amount of time between the two datetimes. So you can just iterate over the list, subtract the current time from the first time, and use the total_seconds() method from the timedelta object it returns to get the difference in seconds.

Upvotes: 4

Jamie Counsell
Jamie Counsell

Reputation: 8123

Something like this would work (no numpy required):

times = [datetime.datetime(2014, 6, 23, 18, 56, 30),
    datetime.datetime(2014, 6, 23, 18, 57),
    datetime.datetime(2014, 6, 23, 18, 57, 30),
    datetime.datetime(2014, 6, 23, 18, 58),
    datetime.datetime(2014, 6, 23, 18, 58, 30),
    datetime.datetime(2014, 6, 23, 18, 59),
    datetime.datetime(2014, 6, 23, 18, 59, 30)]

start = times[0]
output = [ (t - start).seconds for t in times]

print output
# [0, 30, 60, 90, 120, 150, 180]

Edit: I see I was beaten to it! Good work :D

Upvotes: 1

Related Questions