Rishindra
Rishindra

Reputation: 29

Pandas for each group of 4 calendar months find the date on which maximum value occured

For each group of 4 consecutive calendar months Need to find the date on which maximum value occured

My Dataframe looks like this.

DataFrame

Upvotes: 0

Views: 1233

Answers (3)

Hemanth
Hemanth

Reputation: 81

import numpy as np
from numpy.random import randn as rn
DatetimeIndex =  pd.date_range('1/1/2015', '12/31/2015', freq='B')
np.random.seed(101)
df = pd.DataFrame(rn(len(DatetimeIndex)),index=DatetimeIndex)
df.groupby(pd.Grouper(freq='4MS',label='right')).max()[:3]

2015-05-01 2.706850 2015-09-01 2.302987 2016-01-01 2.493990

Upvotes: 1

LevG
LevG

Reputation: 177

You can apply groupby modulo 4 months on date and agg with max and idxmax to get the corresponding max value and its date per 4 consecutive months.

Here is one line solution assuming that date is index as in your example:

df.groupby(lambda x:(x.year*10 + x.month%4)).agg(['max','idxmax'])

Upvotes: 0

Auberon Vacher
Auberon Vacher

Reputation: 4775

One option is to create a new column with a unique value for each group of four consecutive months. Then sort on this column and on the value, for the later one descending Then only take group on the column with unique value for 4 consecutive months Then only take the first element of the group

from pandas import DataFrame
from datetime import datetime,date

#create a dummy dataframe
values = []
for mo in range(1,13):
    row = {}
    row["date"]=date(2018,mo,1)
    row["val"]=5+0.1*mo
    values.append(row)
df = DataFrame(values)
#create a column which can be ordered and is unique for 
#every group of successive 4 month
#if more than one year is to be considered the lambda function
#should cover this as well
df["quadmester"]=df["date"].apply(lambda x: (x.month-1)/4)
#sort by the added column and then by value
#since the date for which top value in 4 consecutive month
#is considered sort descending on values
df = df.sort_values(by=["quadmester","val"],ascending=[1,0])
#show the dataframe
print df

which yields

      date  val  quadmester

3 2018-04-01 5.4 0
2 2018-03-01 5.3 0
1 2018-02-01 5.2 0
0 2018-01-01 5.1 0
7 2018-08-01 5.8 1
6 2018-07-01 5.7 1
5 2018-06-01 5.6 1
4 2018-05-01 5.5 1
11 2018-12-01 6.2 2
10 2018-11-01 6.1 2
9 2018-10-01 6.0 2
8 2018-09-01 5.9 2

| index | date | val | quadmester|

| 0 |3 | 2018-04-01 | 5.4 | 0 |
| 1| 7 | 2018-08-01 | 5.8 | 1 |
| 2| 11 | 2018-12-01 | 6.2 | 2 |

Upvotes: 0

Related Questions