tda
tda

Reputation: 2133

Loop over 24-hour period using list of dates in Python

I have a list of np.datetime64 dates in Python:

['2016-12-01T02:00:00.000000000', '2016-12-01T04:00:00.000000000',
 '2016-12-01T06:00:00.000000000', '2016-12-01T08:00:00.000000000',
 '2016-12-01T10:00:00.000000000', '2016-12-01T12:00:00.000000000', 
 '2016-12-01T14:00:00.000000000', '2016-12-01T16:00:00.000000000', 
 '2016-12-01T18:00:00.000000000', '2016-12-01T20:00:00.000000000', 
 '2016-12-01T22:00:00.000000000', '2016-12-02T00:00:00.000000000', 
 '2016-12-02T02:00:00.000000000', '2016-12-02T04:00:00.000000000', 
 '2016-12-02T06:00:00.000000000', '2016-12-02T08:00:00.000000000', 
 '2016-12-02T10:00:00.000000000', '2016-12-02T12:00:00.000000000', 
 '2016-12-02T14:00:00.000000000', '2016-12-02T16:00:00.000000000', 
 '2016-12-02T18:00:00.000000000', '2016-12-02T20:00:00.000000000', 
 '2016-12-02T22:00:00.000000000', '2016-12-03T00:00:00.000000000', 
 '2016-12-03T02:00:00.000000000', '2016-12-03T04:00:00.000000000',
 '2016-12-03T06:00:00.000000000', '2016-12-03T08:00:00.000000000', 
 '2016-12-03T10:00:00.000000000', '2016-12-03T12:00:00.000000000', 
 '2016-12-03T14:00:00.000000000', '2016-12-03T16:00:00.000000000', 
 '2016-12-03T18:00:00.000000000', '2016-12-03T20:00:00.000000000', 
 '2016-12-03T22:00:00.000000000']

and I wish to loop over each calendar day within the list. I have tried to extract each unique date from the list (i.e. finding the min and max date and creating a list of dates between these) but this isn't ideal for what I want to do.

My desired outcome would be to have code that allows me to loop over each date/calendar day found in the list and obtain the datetimes corresponding to this date:

for each_date in date_list:
    ***get all datetimes corresponding to each_date***

(loop would occur 3 times in this example)

NOTE:

1) Solutions that iterate over every [n:n+24] or whatever will not work as not every day will have the same number of time steps.

Upvotes: 2

Views: 1118

Answers (2)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477210

If the timestamps are ordered, we can use the itertools.groupby function to group the elements of the array by the corresponding day.

The day can be obtained with np.datetime64.astype(..., dtype='datetime64[D]'), so we can write it like:

from numpy import datetime64
from functools import partial
from itertools import groupby

for day, timestamps in groupby(data_array,
                               partial(datetime64.astype, dtype='datetime64[D]')):
    # process day and timestamps
    pass

Here day is a datetime64[D] numpy object (it contains only the day), and timestamps is an iterable (not a list, but we can convert it to a list) of the corresponding timestamps. data_array is the array that contains the initial data.

For example:

>>> for day, timestamps in groupby(data_array,
...                                partial(datetime64.astype, dtype='datetime64[D]')):
...     print((day, list(timestamps)))
... 
(numpy.datetime64('2016-12-01'), [numpy.datetime64('2016-12-01T02:00:00.000000000'), numpy.datetime64('2016-12-01T04:00:00.000000000'), numpy.datetime64('2016-12-01T06:00:00.000000000'), numpy.datetime64('2016-12-01T08:00:00.000000000'), numpy.datetime64('2016-12-01T10:00:00.000000000'), numpy.datetime64('2016-12-01T12:00:00.000000000'), numpy.datetime64('2016-12-01T14:00:00.000000000'), numpy.datetime64('2016-12-01T16:00:00.000000000'), numpy.datetime64('2016-12-01T18:00:00.000000000'), numpy.datetime64('2016-12-01T20:00:00.000000000'), numpy.datetime64('2016-12-01T22:00:00.000000000')])
(numpy.datetime64('2016-12-02'), [numpy.datetime64('2016-12-02T00:00:00.000000000'), numpy.datetime64('2016-12-02T02:00:00.000000000'), numpy.datetime64('2016-12-02T04:00:00.000000000'), numpy.datetime64('2016-12-02T06:00:00.000000000'), numpy.datetime64('2016-12-02T08:00:00.000000000'), numpy.datetime64('2016-12-02T10:00:00.000000000'), numpy.datetime64('2016-12-02T12:00:00.000000000'), numpy.datetime64('2016-12-02T14:00:00.000000000'), numpy.datetime64('2016-12-02T16:00:00.000000000'), numpy.datetime64('2016-12-02T18:00:00.000000000'), numpy.datetime64('2016-12-02T20:00:00.000000000'), numpy.datetime64('2016-12-02T22:00:00.000000000')])
(numpy.datetime64('2016-12-03'), [numpy.datetime64('2016-12-03T00:00:00.000000000'), numpy.datetime64('2016-12-03T02:00:00.000000000'), numpy.datetime64('2016-12-03T04:00:00.000000000'), numpy.datetime64('2016-12-03T06:00:00.000000000'), numpy.datetime64('2016-12-03T08:00:00.000000000'), numpy.datetime64('2016-12-03T10:00:00.000000000'), numpy.datetime64('2016-12-03T12:00:00.000000000'), numpy.datetime64('2016-12-03T14:00:00.000000000'), numpy.datetime64('2016-12-03T16:00:00.000000000'), numpy.datetime64('2016-12-03T18:00:00.000000000'), numpy.datetime64('2016-12-03T20:00:00.000000000'), numpy.datetime64('2016-12-03T22:00:00.000000000')])

So here for every day, we have opted to print a list of corresponding timestamps, but this is of course one of the options. Like the example shows, not all slices have the same length (the last two have an extra element)

Note that the timestamps is an iterator, and thus gets exhausted, if you do not convert it to a list, then after one loop, the iterator is exhausted.

The groupby works in linear time, since each time it checks if the "group key" is the same as the previous element, but as said before the contraint is data must be ordered.

Upvotes: 3

jpp
jpp

Reputation: 164773

You can use collections.defaultdict for an O(n) solution. You can use Pandas to normalize your datetime objects, although this should also be possible via NumPy.

import pandas as pd
from collections import defaultdict

d = defaultdict(list)

for item in L:
    day = pd.to_datetime(item).normalize().to_datetime64()
    d[day].append(item)

print(d)

defaultdict(list,
            {numpy.datetime64('2016-12-01T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-01T02:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-01T22:00:00.000000000')],
             numpy.datetime64('2016-12-02T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-02T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-02T22:00:00.000000000')],
             numpy.datetime64('2016-12-03T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-03T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-03T22:00:00.000000000')]})

Upvotes: 1

Related Questions