NeedHelp
NeedHelp

Reputation: 83

How to calculate daily averages from noon to noon with pandas?

I am fairly new to python and pandas, so I apologise for any future misunderstandings.

I have a pandas DataFrame with hourly values, looking something like this:

2014-04-01 09:00:00 52.9    41.1    36.3

2014-04-01 10:00:00 56.4    41.6    70.8

2014-04-01 11:00:00 53.3    41.2    49.6

2014-04-01 12:00:00 50.4    39.5    36.6

2014-04-01 13:00:00 51.1    39.2    33.3

2016-11-30 16:00:00 16.0    13.5    36.6

2016-11-30 17:00:00 19.6    17.4    44.3

Now I need to calculate 24h average values for each column starting from 2014-04-01 12:00 to 2014-04-02 11:00 So I want daily averages from noon to noon.

Unfortunately, I have no idea how to do that. I have read some suggestions to use groupby, but I don't really know how...

Thank you very much in advance! Any help is appreciated!!

Upvotes: 8

Views: 511

Answers (3)

ALollz
ALollz

Reputation: 59579

For newer versions of pandas (>= 1.1.0) use the offset argument:

df.resample('24H', offset='12H').mean()

The base argument.

A day is 24 hours, so a base of 12 would start the grouping from Noon - Noon. Resample gives you all days in between, so you could .dropna(how='all') if you don't need the complete basis. (I assume you have a DatetimeIndex, if not you can use the on argument of resample to specify your datetime column.)

df.resample('24H', base=12).mean()
#df.groupby(pd.Grouper(level=0, base=12, freq='24H')).mean() # Equivalent 

                         1      2          3
0                                           
2014-03-31 12:00:00  54.20  41.30  52.233333
2014-04-01 12:00:00  50.75  39.35  34.950000
2014-04-02 12:00:00    NaN    NaN        NaN
2014-04-03 12:00:00    NaN    NaN        NaN
2014-04-04 12:00:00    NaN    NaN        NaN
...                    ...    ...        ...
2016-11-26 12:00:00    NaN    NaN        NaN
2016-11-27 12:00:00    NaN    NaN        NaN
2016-11-28 12:00:00    NaN    NaN        NaN
2016-11-29 12:00:00    NaN    NaN        NaN
2016-11-30 12:00:00  17.80  15.45  40.450000

Upvotes: 8

mjspier
mjspier

Reputation: 6536

You can shift the hours by 12 hours and resample on day level.

from io import StringIO
import pandas as pd

data = """
2014-04-01 09:00:00,52.9,41.1,36.3
2014-04-01 10:00:00,56.4,41.6,70.8
2014-04-01 11:00:00,53.3,41.2,49.6
2014-04-01 12:00:00,50.4,39.5,36.6
2014-04-01 13:00:00,51.1,39.2,33.3
2016-11-30 16:00:00,16.0,13.5,36.6
2016-11-30 17:00:00,19.6,17.4,44.3
"""

df = pd.read_csv(StringIO(data), sep=',', header=None, index_col=0)

df.index = pd.to_datetime(df.index)
# shift by 12 hours
df.index = df.index - pd.Timedelta(hours=12)
# resample and drop na rows
df.resample('D').mean().dropna()

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150805

You could subtract your time and groupby:

df.groupby((df.index - pd.to_timedelta('12:00:00')).normalize()).mean()

Upvotes: 3

Related Questions