Tamarie
Tamarie

Reputation: 135

Averaging values of dates in dataframe

I have the following dataframe

data = pd.DataFrame({
'date': ['1988/01/12', '1988/01/13', '1988/01/14', '1989/01/20','1990/01/01'],
'value': [11558522, 12323552, 13770958, 18412280, 13770958]
}) 

Is there a way in python that I can average a value for a whole month and make that the new value for that month i.e. I want to average the 1988-01 value and make that the final value for 1988-01. I tried the groupby method but that didnt work

new_df=data.groupby(['date']).mean()

Upvotes: 2

Views: 56

Answers (2)

faquimbayal
faquimbayal

Reputation: 23

df=pd.read_csv("data .csv",encoding='ISO-8859-1', parse_dates=["datetime"]) print(df) print(df.dtypes)

             datetime   Temperature

0 1987-11-01 07:00:00 21.4 1 1987-11-01 13:00:00 27.4 2 1987-11-01 19:00:00 25.0 3 1987-11-02 07:00:00 22.0 4 1987-11-02 13:00:00 27.6 ... ... 27554 2020-03-30 13:00:00 24.8 27555 2020-03-30 18:00:00 23.8 27556 2020-03-31 07:00:00 23.4 27557 2020-03-31 13:00:00 24.6 27558 2020-03-31 18:00:00 26.4

df1=df.groupby(pd.Grouper(key='datetime',freq='D')).mean()

datetime Temperature
1987-11-01 24.600000 1987-11-02 25.066667 1987-11-03 24.466667 1987-11-04 22.533333 1987-11-05 25.066667 ... 2020-03-27 26.533333 2020-03-28 27.666667 2020-03-29 27.733333 2020-03-30 24.266667 2020-03-31 24.800000

Upvotes: 1

jezrael
jezrael

Reputation: 863501

Use month periods created by Series.dt.to_period:

data['date'] = pd.to_datetime(data['date'])

new_df=data.groupby(data['date'].dt.to_period('m')).mean()
print (new_df)
                value
date                 
1988-01  1.255101e+07
1989-01  1.841228e+07
1990-01  1.377096e+07

Or use DataFrame.resample and if necessary remove missing values:

new_df=data.resample('MS', on='date').mean().dropna()
print (new_df)
                   value
date                    
1988-01-01  1.255101e+07
1989-01-01  1.841228e+07
1990-01-01  1.377096e+07

Or you can use months and years separately for MultiIndex:

new_df=data.groupby([data['date'].dt.year.rename('y'),
                     data['date'].dt.month.rename('m')]).mean()
print (new_df)

               value
y    m              
1988 1  1.255101e+07
1989 1  1.841228e+07
1990 1  1.377096e+07

Upvotes: 1

Related Questions