Reputation: 135
I have the following dataframe
data = pd.DataFrame({
'date': ['1988/01/12', '1988/01/13', '1988/01/14', '1989/01/20','1990/01/01'],
'value': [11558522, 12323552, 13770958, 18412280, 13770958]
})
Is there a way in python that I can average a value for a whole month and make that the new value for that month i.e. I want to average the 1988-01 value and make that the final value for 1988-01. I tried the groupby method but that didnt work
new_df=data.groupby(['date']).mean()
Upvotes: 2
Views: 56
Reputation: 23
df=pd.read_csv("data .csv",encoding='ISO-8859-1', parse_dates=["datetime"]) print(df) print(df.dtypes)
datetime Temperature
0 1987-11-01 07:00:00 21.4 1 1987-11-01 13:00:00 27.4 2 1987-11-01 19:00:00 25.0 3 1987-11-02 07:00:00 22.0 4 1987-11-02 13:00:00 27.6 ... ... 27554 2020-03-30 13:00:00 24.8 27555 2020-03-30 18:00:00 23.8 27556 2020-03-31 07:00:00 23.4 27557 2020-03-31 13:00:00 24.6 27558 2020-03-31 18:00:00 26.4
df1=df.groupby(pd.Grouper(key='datetime',freq='D')).mean()
datetime Temperature
1987-11-01 24.600000
1987-11-02 25.066667
1987-11-03 24.466667
1987-11-04 22.533333
1987-11-05 25.066667
...
2020-03-27 26.533333
2020-03-28 27.666667
2020-03-29 27.733333
2020-03-30 24.266667
2020-03-31 24.800000
Upvotes: 1
Reputation: 863501
Use month periods created by Series.dt.to_period
:
data['date'] = pd.to_datetime(data['date'])
new_df=data.groupby(data['date'].dt.to_period('m')).mean()
print (new_df)
value
date
1988-01 1.255101e+07
1989-01 1.841228e+07
1990-01 1.377096e+07
Or use DataFrame.resample
and if necessary remove missing values:
new_df=data.resample('MS', on='date').mean().dropna()
print (new_df)
value
date
1988-01-01 1.255101e+07
1989-01-01 1.841228e+07
1990-01-01 1.377096e+07
Or you can use months and years separately for MultiIndex
:
new_df=data.groupby([data['date'].dt.year.rename('y'),
data['date'].dt.month.rename('m')]).mean()
print (new_df)
value
y m
1988 1 1.255101e+07
1989 1 1.841228e+07
1990 1 1.377096e+07
Upvotes: 1