Reputation: 1282
I have a fairly simple question but can't find a clean pandas solution to it.
Given a list of dates in a series like below:
LoadedDate
0 2016-02-18
1 2016-02-19
2 2016-02-20
3 2016-02-23
4 2016-02-24
5 2016-02-25
6 2016-02-26
7 2016-02-27
8 2016-03-01
9 2016-03-02
10 2016-03-03
11 2016-03-04
12 2016-03-05
13 2016-03-08
14 2016-03-09
15 2016-03-10
16 2016-03-11
17 2016-03-12
18 2016-03-15
19 2016-03-16
20 2016-03-17
21 2016-03-18
22 2016-03-19
23 2016-03-22
24 2016-03-23
25 2016-03-24
26 2016-03-25
27 2016-03-30
28 2016-03-31
29 2016-04-01
30 2016-04-02
31 2016-04-05
32 2016-04-06
33 2016-04-07
34 2016-04-08
35 2016-04-09
36 2016-04-12
37 2016-04-13
38 2016-04-14
39 2016-04-15
40 2016-04-16
41 2016-04-19
42 2016-04-20
43 2016-04-21
44 2016-04-22
45 2016-04-23
46 2016-04-27
47 2016-04-28
48 2016-04-29
49 2016-04-30
50 2016-05-02
51 2016-05-03
52 2016-05-04
I'd like to pull the last/max date of each month. So the output would be:
LastDate
0 2016-02-27
1 2016-03-31
2 2016-04-29
3 2016-05-04
I tried df.set_index('LoadedDate').groupby(pd.Grouper(freq='M')).max()
but it returned the max calendar date, not the actual max loaded date of my series.
Thanks.
Upvotes: 1
Views: 488
Reputation: 11
You can try following code:
Create a new column:
df['new_loadeddate']=df['LoadedDate'].apply(lambda date : date[:-3])
now group by month:
grouped_df=df.groupby('new_loadeddate').max()
Upvotes: 1
Reputation: 76917
You could use
In [300]: df.groupby(df.LoadedDate.astype('datetime64[M]')).last().reset_index(drop=True)
Out[300]:
LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04
Or,
In [295]: df.groupby(df.LoadedDate - pd.offsets.MonthEnd()).last().reset_index(drop=True)
Out[295]:
LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04
Or,
In [301]: df.groupby(df.LoadedDate.dt.to_period('M')).last().reset_index(drop=True)
Out[301]:
LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04
Or,
In [303]: df.groupby(df.LoadedDate.astype(str).str[:7]).last().reset_index(drop=True)
Out[303]:
LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04
If the dates are not sorted. Using any of the above methods use idxmax
and loc
In [307]: df.loc[df.groupby(df.LoadedDate.astype(str).str[:7]).LoadedDate.idxmax().values]
Out[307]:
LoadedDate
7 2016-02-27
28 2016-03-31
49 2016-04-30
52 2016-05-04
Upvotes: 7