Reputation: 2133
I have a list of np.datetime64
dates in Python:
['2016-12-01T02:00:00.000000000', '2016-12-01T04:00:00.000000000',
'2016-12-01T06:00:00.000000000', '2016-12-01T08:00:00.000000000',
'2016-12-01T10:00:00.000000000', '2016-12-01T12:00:00.000000000',
'2016-12-01T14:00:00.000000000', '2016-12-01T16:00:00.000000000',
'2016-12-01T18:00:00.000000000', '2016-12-01T20:00:00.000000000',
'2016-12-01T22:00:00.000000000', '2016-12-02T00:00:00.000000000',
'2016-12-02T02:00:00.000000000', '2016-12-02T04:00:00.000000000',
'2016-12-02T06:00:00.000000000', '2016-12-02T08:00:00.000000000',
'2016-12-02T10:00:00.000000000', '2016-12-02T12:00:00.000000000',
'2016-12-02T14:00:00.000000000', '2016-12-02T16:00:00.000000000',
'2016-12-02T18:00:00.000000000', '2016-12-02T20:00:00.000000000',
'2016-12-02T22:00:00.000000000', '2016-12-03T00:00:00.000000000',
'2016-12-03T02:00:00.000000000', '2016-12-03T04:00:00.000000000',
'2016-12-03T06:00:00.000000000', '2016-12-03T08:00:00.000000000',
'2016-12-03T10:00:00.000000000', '2016-12-03T12:00:00.000000000',
'2016-12-03T14:00:00.000000000', '2016-12-03T16:00:00.000000000',
'2016-12-03T18:00:00.000000000', '2016-12-03T20:00:00.000000000',
'2016-12-03T22:00:00.000000000']
and I wish to loop over each calendar day within the list. I have tried to extract each unique date from the list (i.e. finding the min and max date and creating a list of dates between these) but this isn't ideal for what I want to do.
My desired outcome would be to have code that allows me to loop over each date/calendar day found in the list and obtain the datetimes corresponding to this date:
for each_date in date_list:
***get all datetimes corresponding to each_date***
(loop would occur 3 times in this example)
NOTE:
1) Solutions that iterate over every [n:n+24] or whatever will not work as not every day will have the same number of time steps.
Upvotes: 2
Views: 1118
Reputation: 477210
If the timestamps are ordered, we can use the itertools.groupby
function to group the elements of the array by the corresponding day.
The day can be obtained with np.datetime64.astype(..., dtype='datetime64[D]')
, so we can write it like:
from numpy import datetime64
from functools import partial
from itertools import groupby
for day, timestamps in groupby(data_array,
partial(datetime64.astype, dtype='datetime64[D]')):
# process day and timestamps
pass
Here day
is a datetime64[D]
numpy object (it contains only the day), and timestamps
is an iterable (not a list, but we can convert it to a list) of the corresponding timestamps. data_array
is the array that contains the initial data.
For example:
>>> for day, timestamps in groupby(data_array,
... partial(datetime64.astype, dtype='datetime64[D]')):
... print((day, list(timestamps)))
...
(numpy.datetime64('2016-12-01'), [numpy.datetime64('2016-12-01T02:00:00.000000000'), numpy.datetime64('2016-12-01T04:00:00.000000000'), numpy.datetime64('2016-12-01T06:00:00.000000000'), numpy.datetime64('2016-12-01T08:00:00.000000000'), numpy.datetime64('2016-12-01T10:00:00.000000000'), numpy.datetime64('2016-12-01T12:00:00.000000000'), numpy.datetime64('2016-12-01T14:00:00.000000000'), numpy.datetime64('2016-12-01T16:00:00.000000000'), numpy.datetime64('2016-12-01T18:00:00.000000000'), numpy.datetime64('2016-12-01T20:00:00.000000000'), numpy.datetime64('2016-12-01T22:00:00.000000000')])
(numpy.datetime64('2016-12-02'), [numpy.datetime64('2016-12-02T00:00:00.000000000'), numpy.datetime64('2016-12-02T02:00:00.000000000'), numpy.datetime64('2016-12-02T04:00:00.000000000'), numpy.datetime64('2016-12-02T06:00:00.000000000'), numpy.datetime64('2016-12-02T08:00:00.000000000'), numpy.datetime64('2016-12-02T10:00:00.000000000'), numpy.datetime64('2016-12-02T12:00:00.000000000'), numpy.datetime64('2016-12-02T14:00:00.000000000'), numpy.datetime64('2016-12-02T16:00:00.000000000'), numpy.datetime64('2016-12-02T18:00:00.000000000'), numpy.datetime64('2016-12-02T20:00:00.000000000'), numpy.datetime64('2016-12-02T22:00:00.000000000')])
(numpy.datetime64('2016-12-03'), [numpy.datetime64('2016-12-03T00:00:00.000000000'), numpy.datetime64('2016-12-03T02:00:00.000000000'), numpy.datetime64('2016-12-03T04:00:00.000000000'), numpy.datetime64('2016-12-03T06:00:00.000000000'), numpy.datetime64('2016-12-03T08:00:00.000000000'), numpy.datetime64('2016-12-03T10:00:00.000000000'), numpy.datetime64('2016-12-03T12:00:00.000000000'), numpy.datetime64('2016-12-03T14:00:00.000000000'), numpy.datetime64('2016-12-03T16:00:00.000000000'), numpy.datetime64('2016-12-03T18:00:00.000000000'), numpy.datetime64('2016-12-03T20:00:00.000000000'), numpy.datetime64('2016-12-03T22:00:00.000000000')])
So here for every day, we have opted to print a list of corresponding timestamps
, but this is of course one of the options. Like the example shows, not all slices have the same length (the last two have an extra element)
Note that the timestamps
is an iterator, and thus gets exhausted, if you do not convert it to a list, then after one loop, the iterator is exhausted.
The groupby
works in linear time, since each time it checks if the "group key" is the same as the previous element, but as said before the contraint is data must be ordered.
Upvotes: 3
Reputation: 164773
You can use collections.defaultdict
for an O(n) solution. You can use Pandas to normalize your datetime
objects, although this should also be possible via NumPy.
import pandas as pd
from collections import defaultdict
d = defaultdict(list)
for item in L:
day = pd.to_datetime(item).normalize().to_datetime64()
d[day].append(item)
print(d)
defaultdict(list,
{numpy.datetime64('2016-12-01T00:00:00.000000000'):
[numpy.datetime64('2016-12-01T02:00:00.000000000'),
...
numpy.datetime64('2016-12-01T22:00:00.000000000')],
numpy.datetime64('2016-12-02T00:00:00.000000000'):
[numpy.datetime64('2016-12-02T00:00:00.000000000'),
...
numpy.datetime64('2016-12-02T22:00:00.000000000')],
numpy.datetime64('2016-12-03T00:00:00.000000000'):
[numpy.datetime64('2016-12-03T00:00:00.000000000'),
...
numpy.datetime64('2016-12-03T22:00:00.000000000')]})
Upvotes: 1