Reputation: 10794
I have a numpy array called dt
. Each element is of type datetime.timedelta
. For example:
>>>dt[0]
datetime.timedelta(0, 1, 36000)
how can I convert dt
into the array dt_sec
which contains only seconds without looping? my current solution (which works, but I don't like it) is:
dt_sec = zeros((len(dt),1))
for i in range(0,len(dt),1):
dt_sec[i] = dt[i].total_seconds()
I tried to use dt.total_seconds()
but of course it didn't work. any idea on how to avoid this loop?
Thanks
Upvotes: 10
Views: 16768
Reputation: 2553
Recommendation
It is recommended to convert as follows:
deltatime.astype("timedelta64[ms]").astype("int64")/1000
Problem of times.astype("timedelta64[ms]").astype(int)
The data type timedelta64
stores data as a 64-bit integer. The astyp(int)
method will convert data into a 32-bit integer. So there is a chance that the conversion will fail, as demonstrated below:
date_rng = np.arange(
np.datetime64("2022-09-01"),
np.datetime64("2022-09-30"),
np.timedelta64(1, "D")
)
deltatime = date_rng - np.datetime64("2022-01-01")
print( deltatime.astype("timedelta64[ms]").astype(int) / 1000 )
# output:
#[-479636.48 -393236.48 -306836.48 -220436.48 -134036.48 -47636.48
# 38763.52 125163.52 211563.52 297963.52 384363.52 470763.52
# 557163.52 643563.52 729963.52 816363.52 902763.52 989163.52
# 1075563.52 1161963.52 1248363.52 1334763.52 1421163.52 1507563.52
# 1593963.52 1680363.52 1766763.52 1853163.52 1939563.52]
print( deltatime.astype("timedelta64[ms]").astype("int64")/1000 )
#[20995200. 21081600. 21168000. 21254400. 21340800. 21427200. 21513600.
# 21600000. 21686400. 21772800. 21859200. 21945600. 22032000. 22118400.
# 22204800. 22291200. 22377600. 22464000. 22550400. 22636800. 22723200.
# 22809600. 22896000. 22982400. 23068800. 23155200. 23241600. 23328000.
# 23414400.]
Upvotes: 0
Reputation: 42946
A convenient and elegant way is using a pandas.Series
and using the dt.total_seconds
attribute:
import numpy as np
import pandas as pd
# create example datetime arrays
arr1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
arr2 = np.array(['2007-07-15', '2006-01-18', '2010-08-22'], dtype='datetime64')
# timedelta array
td = arr2 - arr1
# get total seconds
pd.Series(td).dt.total_seconds()
0 172800.0
1 432000.0
2 777600.0
dtype: float64
Upvotes: 1
Reputation: 60227
numpy
has its own datetime
and timedelta
formats. Just use them ;).
Set-up for example:
import datetime
import numpy
times = numpy.array([datetime.timedelta(0, 1, 36000)])
Code:
times.astype("timedelta64[ms]").astype(int) / 1000
#>>> array([ 1.036])
Since people don't seem to realise that this is the best solution, here are some timings of a timedelta64
array vs a datetime.datetime
array:
SETUP="
import datetime
import numpy
times = numpy.array([datetime.timedelta(0, 1, 36000)] * 100000)
numpy_times = times.astype('timedelta64[ms]')
"
python -m timeit -s "$SETUP" "numpy_times.astype(int) / 1000"
python -m timeit -s "$SETUP" "numpy.vectorize(lambda x: x.total_seconds())(times)"
python -m timeit -s "$SETUP" "[delta.total_seconds() for delta in times]"
Results:
100 loops, best of 3: 4.54 msec per loop
10 loops, best of 3: 99.5 msec per loop
10 loops, best of 3: 67.1 msec per loop
The initial translation will take about two times as much time as the vectorized expression, but each operation from then-on into perpetuity on that timedelta
array will be about 20 times faster.
If you're never going to use those timedelta
s again, consider asking yourself why you ever made the deltas (as opposed to timedelta64
s) in the first place, and then use the numpy.vectorize
expression. It's less native but for some reason it's faster.
Upvotes: 13
Reputation: 6030
I like the use of np.vectorize
as suggested by prgao. If you just want a Python list, you can also do the following:
dt_sec = map(datetime.timedelta.total_seconds, dt)
Upvotes: 0
Reputation: 13634
You could use a "list comprehension":
dt_sec = [delta.total_seconds() for delta in dt]
Behind the scenes, numpy ought to translate that to a pretty speedy operation.
Upvotes: -3
Reputation: 1787
import numpy as np
helper = np.vectorize(lambda x: x.total_seconds())
dt_sec = helper(dt)
Upvotes: 14