GGiacomo
GGiacomo

Reputation: 75

finding average for the same date in different years

I am working on the following DataFrame:

df
Out[1]: 
              temp_C
date          
2013-01-01    12
2013-01-02    11
2013-01-03    10
2013-01-04     9
2013-01-05    10
2013-01-06    10
2013-01-07    11
2013-01-08    12
2013-01-09    14
2013-01-10    14
2013-01-11    12
2013-01-12    12
2013-01-13    11
2013-01-14    10
2013-01-15    10
2013-01-16    12
2013-01-17    13
...   
2017-01-02     8
2017-01-03     8
2017-01-04     8
2017-01-05     9
2017-01-06     9
2017-01-07    10
2017-01-08    12
2017-01-09    14
2017-01-10    14
2017-01-11    10
2017-01-12    10
2017-01-13    11
2017-01-14    14
2017-01-15    13
2017-01-16    10
2017-01-17     9
[1770 rows x 1 columns]

What I need to do is to group the values by the day of the year, find the mean (or median) values of each group, and thus obtaining a new DataFrame, in which the values of each day is the mean/median/... of all the values for the same day.

Here's an example:

df_grouped
Out[2]: 
              temp_C
date
2013-01-01    12
2014-01-01    10
2015-01-01    10
2016-01-01    12
2017-01-01    11
2013-01-02    11
2014-01-02    10
....
2016-12-31    8
2017-12-31    7

df_mean
Out[3]: 
              temp_C
date
1970-01-01    11 #the year is not meaningful anymore
1970-01-02    11.4
1970-01-03    12.5
....
1970-12-30    7.5
1970-12-31    7.5

Thank you.

Upvotes: 1

Views: 1389

Answers (1)

piRSquared
piRSquared

Reputation: 294516

Setup

df = pd.DataFrame(
    {'temp_C': range(10)},
    pd.to_datetime([
        '2010-01-23', '2012-03-30',
        '2013-01-23', '2013-03-30',
        '2014-01-23', '2014-03-30',
        '2016-01-23', '2015-03-30',
        '2017-01-23', '2017-03-30',
    ])
)

groupby

df.groupby('{:%m-%d}'.format).mean()

       temp_C
01-23       4
03-30       5

Explanation

Strings have a format method that you can use as a callable. It takes arguments that get processed and interpolated as a new string.

'{:%m-%d}'.format is a callable that takes a single positional argument and gets processed by what is in the {} within the string. In this case '{:%m-%d}' is specific to handling dates and the formatting can be better understood here. It says to when looking at a date, format it as month-day.

When passing a callable to groupby it applies that callable to each element of the index. Since our index is Datetime then each element gets returned as the month and day. That is precisely what we wanted in order to take our mean.

Upvotes: 2

Related Questions