Reputation: 81
I have a df like the df shown in the figures and I want to:
group by days of several years (2006 to 2010, example: group(2006-01-01, 2007-01-01, 2008-01-01, 2009-01-01, 20010-01-01)) and then,
take the mean value of the Pressure column of these days (2006-01-01, 2007-01-01, 2008-01-01, 2009-01-01, 20010-01-01) and then,
generate a new column with just the mean of the Pressure values
Upvotes: 0
Views: 55
Reputation: 35115
In this case, we need the original number of rows of data grouped by month and day, which can be achieved by creating and adding columns using the transform function.
import pandas as pd
import numpy as np
np.random.seed(20210430)
df = pd.DataFrame({'Date':pd.to_datetime(pd.date_range('2006-01-01','2011-01-01', freq='1d')), 'Pressure':np.random.randint(40,60,(1827,))})
# Check the numbers beforehand.
df[(df.Date.dt.month == 1) & (df.Date.dt.day == 1)]
Date Pressure
0 2006-01-01 51
365 2007-01-01 45
730 2008-01-01 53
1096 2009-01-01 45
1461 2010-01-01 59
1826 2011-01-01 40
df['mean'] = df.groupby([df.Date.dt.month, df.Date.dt.day])['Pressure'].transform('mean')
# Numerical verification after the fact
df[(df.Date.dt.month == 1) & (df.Date.dt.day == 1)]
Date Pressure mean
0 2006-01-01 51 48.833333
365 2007-01-01 45 48.833333
730 2008-01-01 53 48.833333
1096 2009-01-01 45 48.833333
1461 2010-01-01 59 48.833333
1826 2011-01-01 40 48.833333
Upvotes: 1