lee.edward01
lee.edward01

Reputation: 473

Pandas/Python Pulling end of month rows from dataframe into separate dataframe

Currently I have a time series data frame as follows:

dfMain =

          Date  Portfolio Value
0   2016-07-01     1.000000e+06
1   2016-07-08     1.025168e+06
2   2016-07-15     1.028053e+06
3   2016-07-22     1.024184e+06
4   2016-07-29     1.022491e+06
5   2016-08-05     1.023241e+06
6   2016-08-12     1.030325e+06
7   2016-08-19     1.032742e+06
8   2016-08-26     1.032567e+06
9   2016-09-02     1.028614e+06
10  2016-09-09     9.930876e+05
11  2016-09-16     9.956875e+05
12  2016-09-23     1.010174e+06
13  2016-09-30     1.010388e+06
14  2016-10-07     1.004989e+06
15  2016-10-14     9.924929e+05
16  2016-10-21     9.969708e+05
17  2016-10-28     9.816373e+05
18  2016-11-04     9.563689e+05
19  2016-11-11     9.869579e+05
20  2016-11-18     9.936929e+05
21  2016-11-25     1.009625e+06 

Given that the dataframe can be different (can't just pull specific rows from example) what would be the best way to pull the closest to the end of month dates from the dataframe? for example index 4 would be pulled because that is the closest to the end of month date.

Any tips would be greatly appreciated!

Upvotes: 0

Views: 56

Answers (2)

cs95
cs95

Reputation: 402323

Group on the month number and find the last record:

df.Date = pd.to_datetime(df.Date, errors='coerce')
df.groupby(df.Date.dt.month).last()

           Date  Portfolio Value
Date                            
7    2016-07-29        1022491.0
8    2016-08-26        1032567.0
9    2016-09-30        1010388.0
10   2016-10-28         981637.3
11   2016-11-25        1009625.0

If rows aren't sorted by Date, call sort_values first:

df.sort_values('Date').groupby(df.Date.dt.month).last()

           Date  Portfolio Value
Date                            
7    2016-07-29        1022491.0
8    2016-08-26        1032567.0
9    2016-09-30        1010388.0
10   2016-10-28         981637.3
11   2016-11-25        1009625.0

Should work in any case.

If you have dates spanning multiple years, better to groupby on the year-month:

df.sort_values('Date').groupby([df.Date.dt.year, df.Date.dt.month]).last()

Upvotes: 2

harpan
harpan

Reputation: 8631

You need to sort the dates and then find the last value for each group.

df['Date'] = pd.to_datetime(df['Date'])
grp = df.sort_values('Date').groupby(df['Date'].dt.month)
pd.DataFrame([grp.get_group(x).iloc[-1] for x in grp.groups])

Output:

        Date    Portfolio Value
4   2016-07-29  1022491.0
8   2016-08-26  1032567.0
13  2016-09-30  1010388.0
17  2016-10-28  981637.3
21  2016-11-25  1009625.0

Upvotes: 1

Related Questions