Jason
Jason

Reputation: 356

Group by date range in pandas dataframe

I have a time series data in pandas, and I would like to group by a certain time window in each year and calculate its min and max.

For example:

times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D')
df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"])

How to group by time window e.g. 'Jan-10':'Mar-21' for each year and calculate its min and max for column value?

Upvotes: 3

Views: 6749

Answers (3)

ALollz
ALollz

Reputation: 59519

You can define the bin edges, then throw out the bins you don't need (every other) with .loc[::2, :]. Here I'll define two functions just to check we're getting the date ranges we want within groups (Note since left edges are open, need to subtract 1 day):

import pandas as pd

edges = pd.to_datetime([x for year in df.index.year.unique() 
                        for x in [f'{year}-02-09', f'{year}-03-21']])

def min_idx(x):
    return x.index.min()
def max_idx(x):
    return x.index.max()

df.groupby(pd.cut(df.index, bins=edges)).agg([min_idx, max_idx, min, max]).loc[::2, :]

Output:

                              value                               
                            min_idx    max_idx       min       max
(2011-02-09, 2011-03-21] 2011-02-10 2011-03-21  0.009343  0.990564
(2012-02-09, 2012-03-21] 2012-02-10 2012-03-21  0.026369  0.978470
(2013-02-09, 2013-03-21] 2013-02-10 2013-03-21  0.039491  0.946481
(2014-02-09, 2014-03-21] 2014-02-10 2014-03-21  0.029161  0.967490
(2015-02-09, 2015-03-21] 2015-02-10 2015-03-21  0.006877  0.969296
(2016-02-09, 2016-03-21]        NaT        NaT       NaN       NaN

Upvotes: 0

Arpan
Arpan

Reputation: 134

I'm not sure if there's a direct way to do it without first creating a flag for the days required. The following function is used to create a flag required:

# Function for flagging the days required    
def flag(x):
    if x.month == 1 and x.day>=10: return True
    elif x.month in [2,3,4]: return True
    elif x.month == 5 and x.day<=21: return True
    else: return False

Since you require for each year, it would be a good idea to have the year as a column. Then the min and max for each year for given periods can be obtained with the code below:

times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D')
df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"])
df['Year'] = df.index.year
pd.pivot_table(df[list(pd.Series(df.index).apply(flag))], values=['value'], index = ['Year'], aggfunc=[min,max])

The output will look like follows: Sample Output

Hope that answers your question... :)

Upvotes: 2

Dallas Lindauer
Dallas Lindauer

Reputation: 239

You can use the resample method.

df.resample('5d').agg(['min','max'])

Upvotes: 3

Related Questions