Reputation: 189
I have the following array (it could be a list as well):
uniqueDates = np.array([datetime.date(2017, 4, 11), datetime.date(2017, 4, 12),
datetime.date(2017, 4, 20), datetime.date(2017, 4, 25),
datetime.date(2017, 5, 3), datetime.date(2017, 5, 4),
datetime.date(2017, 5, 10), datetime.date(2017, 5, 11),
datetime.date(2017, 6, 1), datetime.date(2017, 6, 13),
datetime.date(2017, 6, 15), datetime.date(2017, 7, 10),
datetime.date(2017, 7, 13), datetime.date(2017, 7, 17)])
I want to split this array into 4 lists each of which contains the dates in the unique months (April, May, June, and July). So, the intended result looks something like this:
monthsList = [[datetime.date(2017, 4, 11),
datetime.date(2017, 4, 12),
datetime.date(2017, 4, 20),
datetime.date(2017, 4, 25)],
[datetime.date(2017, 5, 3),
datetime.date(2017, 5, 4),
datetime.date(2017, 5, 10),
datetime.date(2017, 5, 11)],
[datetime.date(2017, 6, 1),
datetime.date(2017, 6, 13),
datetime.date(2017, 6, 15)],
[datetime.date(2017, 7, 10),
datetime.date(2017, 7, 13),
datetime.date(2017, 7, 17)]]
I am wondering if there is a function that can do this automatically? Or should I loop over the elements and check them individually? I am looking for an efficient way to accomplish this task. I searched for a few questions here in stackoverflow but could not find what I am looking for.
Upvotes: 3
Views: 813
Reputation: 42133
You can use groupby from itertools:
from itertools import groupby
grouped = [[*g] for _,g in groupby(uniqueDates,key=lambda d:(d.year,d.month))]
print(*(", ".join(map(str,g)) for g in grouped),sep="\n")
2017-04-11, 2017-04-12, 2017-04-20, 2017-04-25
2017-05-03, 2017-05-04, 2017-05-10, 2017-05-11
2017-06-01, 2017-06-13, 2017-06-15
2017-07-10, 2017-07-13, 2017-07-17
This will work even if your input is just a normal Python list. You shouldn't use numpy unless it is necessary
Upvotes: 0
Reputation: 2109
You can use pandas:
import pandas as pd
...
s = pd.Series(uniqueDates)
list(s.groupby(s.map(lambda x: x.month)))
Edit: as pointed out by Nidal Barada, his looping approach is significantly faster. Using %%timeit magic in Jupyter:
pandas: 562 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Nidal Barada's answer: 8.14 µs ± 39.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 2
Reputation: 393
This would work as long as the months are grouped together(less computation):
Dates=[]
for i in range(len(uniqueDates)):
if(Dates==[]):
Dates.append([uniqueDates[i]])
elif(uniqueDates[i].month==Dates[-1][0].month):
Dates[-1].append(uniqueDates[i])
else:
Dates.append([uniqueDates[i]])
Otherwise use:
Dates=[]
for i in range(len(uniqueDates)):
if(Dates==[]):
Dates.append([uniqueDates[i]])
else:
for y in range(len(Dates)):
if(Dates[y][0].month == uniqueDates[i].month):
Dates[y].append(uniqueDates[i])
break
if(y==len(Dates)-1):
Dates.append([uniqueDates[i]])
Both Output:
[
[datetime.date(2017, 4, 11), datetime.date(2017, 4, 12), datetime.date(2017, 4, 20), datetime.date(2017, 4, 25)],
[datetime.date(2017, 5, 3), datetime.date(2017, 5, 4), datetime.date(2017, 5, 10), datetime.date(2017, 5, 11)],
[datetime.date(2017, 6, 1), datetime.date(2017, 6, 13), datetime.date(2017, 6, 15)],
[datetime.date(2017, 7, 10), datetime.date(2017, 7, 13), datetime.date(2017, 7, 17)]
]
Timing the results of the first vs second function along with pandas answer provided by @Tom83B:
Repeated: 100,000x
First Function: 0.10295674900044105 seconds
Second Function: 1.5613631390006049 seconds
Pandas Function: 146.28389169599905 seconds
Upvotes: 2