Hugo V
Hugo V

Reputation: 81

Group by just day in a date_time column and take the mean

I have a df like the df shown in the figures and I want to:

  1. group by days of several years (2006 to 2010, example: group(2006-01-01, 2007-01-01, 2008-01-01, 2009-01-01, 20010-01-01)) and then,

  2. take the mean value of the Pressure column of these days (2006-01-01, 2007-01-01, 2008-01-01, 2009-01-01, 20010-01-01) and then,

  3. generate a new column with just the mean of the Pressure values

df df (continuation)

Upvotes: 0

Views: 55

Answers (1)

r-beginners
r-beginners

Reputation: 35115

In this case, we need the original number of rows of data grouped by month and day, which can be achieved by creating and adding columns using the transform function.

import pandas as pd
import numpy as np

np.random.seed(20210430)
df = pd.DataFrame({'Date':pd.to_datetime(pd.date_range('2006-01-01','2011-01-01', freq='1d')), 'Pressure':np.random.randint(40,60,(1827,))})

# Check the numbers beforehand.
df[(df.Date.dt.month == 1) & (df.Date.dt.day == 1)]

    Date    Pressure
0       2006-01-01  51
365     2007-01-01  45
730     2008-01-01  53
1096    2009-01-01  45
1461    2010-01-01  59
1826    2011-01-01  40

df['mean'] = df.groupby([df.Date.dt.month, df.Date.dt.day])['Pressure'].transform('mean')

# Numerical verification after the fact
df[(df.Date.dt.month == 1) & (df.Date.dt.day == 1)]

    Date    Pressure    mean
0       2006-01-01  51  48.833333
365     2007-01-01  45  48.833333
730     2008-01-01  53  48.833333
1096    2009-01-01  45  48.833333
1461    2010-01-01  59  48.833333
1826    2011-01-01  40  48.833333

Upvotes: 1

Related Questions