O. Mohsen
O. Mohsen

Reputation: 189

Splitting an array of dates into multiple lists as per month

I have the following array (it could be a list as well):

uniqueDates = np.array([datetime.date(2017, 4, 11), datetime.date(2017, 4, 12),
                        datetime.date(2017, 4, 20), datetime.date(2017, 4, 25),
                        datetime.date(2017, 5, 3), datetime.date(2017, 5, 4),
                        datetime.date(2017, 5, 10), datetime.date(2017, 5, 11),
                        datetime.date(2017, 6, 1), datetime.date(2017, 6, 13),
                        datetime.date(2017, 6, 15), datetime.date(2017, 7, 10),
                        datetime.date(2017, 7, 13), datetime.date(2017, 7, 17)])

I want to split this array into 4 lists each of which contains the dates in the unique months (April, May, June, and July). So, the intended result looks something like this:

monthsList = [[datetime.date(2017, 4, 11),
               datetime.date(2017, 4, 12),
               datetime.date(2017, 4, 20),
               datetime.date(2017, 4, 25)],
              [datetime.date(2017, 5, 3),
               datetime.date(2017, 5, 4),
               datetime.date(2017, 5, 10),
               datetime.date(2017, 5, 11)],
              [datetime.date(2017, 6, 1),
               datetime.date(2017, 6, 13),
               datetime.date(2017, 6, 15)],
              [datetime.date(2017, 7, 10),
               datetime.date(2017, 7, 13),
               datetime.date(2017, 7, 17)]]

I am wondering if there is a function that can do this automatically? Or should I loop over the elements and check them individually? I am looking for an efficient way to accomplish this task. I searched for a few questions here in stackoverflow but could not find what I am looking for.

Upvotes: 3

Views: 813

Answers (3)

Alain T.
Alain T.

Reputation: 42133

You can use groupby from itertools:

from itertools import groupby

grouped = [[*g] for _,g in groupby(uniqueDates,key=lambda d:(d.year,d.month))]

print(*(", ".join(map(str,g)) for g in grouped),sep="\n")

2017-04-11, 2017-04-12, 2017-04-20, 2017-04-25
2017-05-03, 2017-05-04, 2017-05-10, 2017-05-11
2017-06-01, 2017-06-13, 2017-06-15
2017-07-10, 2017-07-13, 2017-07-17

This will work even if your input is just a normal Python list. You shouldn't use numpy unless it is necessary

Upvotes: 0

Tom83B
Tom83B

Reputation: 2109

You can use pandas:

import pandas as pd

...

s = pd.Series(uniqueDates)
list(s.groupby(s.map(lambda x: x.month)))

Edit: as pointed out by Nidal Barada, his looping approach is significantly faster. Using %%timeit magic in Jupyter:

pandas: 562 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Nidal Barada's answer: 8.14 µs ± 39.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Upvotes: 2

Nidal Barada
Nidal Barada

Reputation: 393

This would work as long as the months are grouped together(less computation):

Dates=[]

for i in range(len(uniqueDates)):
    if(Dates==[]):
        Dates.append([uniqueDates[i]])
    elif(uniqueDates[i].month==Dates[-1][0].month):
        Dates[-1].append(uniqueDates[i])
    else:
        Dates.append([uniqueDates[i]])

Otherwise use:

Dates=[]

for i in range(len(uniqueDates)):
    if(Dates==[]):
        Dates.append([uniqueDates[i]])
    else:
        for y in range(len(Dates)):
            if(Dates[y][0].month == uniqueDates[i].month):
                Dates[y].append(uniqueDates[i])
                break
            if(y==len(Dates)-1):
                Dates.append([uniqueDates[i]])

Both Output:

[
    [datetime.date(2017, 4, 11), datetime.date(2017, 4, 12), datetime.date(2017, 4, 20), datetime.date(2017, 4, 25)],
    [datetime.date(2017, 5, 3), datetime.date(2017, 5, 4), datetime.date(2017, 5, 10), datetime.date(2017, 5, 11)],
    [datetime.date(2017, 6, 1), datetime.date(2017, 6, 13), datetime.date(2017, 6, 15)],
    [datetime.date(2017, 7, 10), datetime.date(2017, 7, 13), datetime.date(2017, 7, 17)]
]

Timing the results of the first vs second function along with pandas answer provided by @Tom83B:

Repeated: 100,000x
    First Function:   0.10295674900044105  seconds
    Second Function:  1.5613631390006049   seconds
    Pandas Function:  146.28389169599905   seconds

Upvotes: 2

Related Questions