Reputation: 356
I have a time series data in pandas, and I would like to group by a certain time window in each year and calculate its min and max.
For example:
times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D')
df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"])
How to group by time window e.g. 'Jan-10':'Mar-21'
for each year and calculate its min and max for column value
?
Upvotes: 3
Views: 6749
Reputation: 59519
You can define the bin edges, then throw out the bins you don't need (every other) with .loc[::2, :]
. Here I'll define two functions just to check we're getting the date ranges we want within groups (Note since left edges are open, need to subtract 1 day):
import pandas as pd
edges = pd.to_datetime([x for year in df.index.year.unique()
for x in [f'{year}-02-09', f'{year}-03-21']])
def min_idx(x):
return x.index.min()
def max_idx(x):
return x.index.max()
df.groupby(pd.cut(df.index, bins=edges)).agg([min_idx, max_idx, min, max]).loc[::2, :]
value
min_idx max_idx min max
(2011-02-09, 2011-03-21] 2011-02-10 2011-03-21 0.009343 0.990564
(2012-02-09, 2012-03-21] 2012-02-10 2012-03-21 0.026369 0.978470
(2013-02-09, 2013-03-21] 2013-02-10 2013-03-21 0.039491 0.946481
(2014-02-09, 2014-03-21] 2014-02-10 2014-03-21 0.029161 0.967490
(2015-02-09, 2015-03-21] 2015-02-10 2015-03-21 0.006877 0.969296
(2016-02-09, 2016-03-21] NaT NaT NaN NaN
Upvotes: 0
Reputation: 134
I'm not sure if there's a direct way to do it without first creating a flag for the days required. The following function is used to create a flag required:
# Function for flagging the days required
def flag(x):
if x.month == 1 and x.day>=10: return True
elif x.month in [2,3,4]: return True
elif x.month == 5 and x.day<=21: return True
else: return False
Since you require for each year, it would be a good idea to have the year as a column. Then the min and max for each year for given periods can be obtained with the code below:
times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D')
df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"])
df['Year'] = df.index.year
pd.pivot_table(df[list(pd.Series(df.index).apply(flag))], values=['value'], index = ['Year'], aggfunc=[min,max])
The output will look like follows: Sample Output
Hope that answers your question... :)
Upvotes: 2
Reputation: 239
You can use the resample method.
df.resample('5d').agg(['min','max'])
Upvotes: 3