Reputation: 63
I have some time series data which contains both day time and night time values. The time entries are parsed in DateTime format, and I have a routine that is able to determine the values corresponding to night periods.
For my purposes, I would like to compute the rolling mean, but only using night time values.
Is there an easy way of doing this?
Any help is greatly appreciated!
Update: A slice through the dataframe looks like this:
0 Time Value
1 2016-02-16 08:40:14.133000 12
2 2016-02-16 11:25:14.133000 4
3 2016-02-16 23:45:14.133000 8
4 2016-03-16 08:40:14.002700 17
5 2016-03-16 23:45:14.133000 2
6 2016-03-16 23:50:14.133000 6
7 2016-03-16 23:55:14.133000 9
8 2016-04-16 08:40:14.133000 10
9 2016-04-16 11:20:14.133000 2
10 2016-04-16 12:40:14.133000 7
11 2016-04-16 23:45:14.133000 5
12 2016-05-16 08:40:14.002700 11
13 2016-05-16 23:40:14.133000 3
14 2016-05-16 23:50:14.133000 4
15 2016-06-16 08:40:14.002700 11
16 2016-06-16 10:30:14.002700 27
17 2016-06-16 23:25:14.133000 3
18 2016-06-16 23:30:14.133000 5
19 2016-07-16 08:40:14.002700 7
20 2016-07-16 11:15:14.002700 9
21 2016-07-16 23:45:14.133000 18
I am interested in computing the weekly rolling-mean for night values only. In my examples, night time is between 11pm and 5am.
Upvotes: 0
Views: 790
Reputation: 3855
You can use a column coding which night it is using date
of pandas datetime series. And then use this column for a groupby to compute your rolling mean per week:
df.Time = pd.to_datetime(df.Time)
df['night'] = (df.Time.dt.hour>22) | (df.Time.dt.hour < 6)
df['date'] = df.Time.dt.date
idx = (df.night) & (df.Time.dt.hour>22)
df.loc[idx,'date'] = df.loc[idx,'date'].values + pd.DateOffset(1)
df.date = pd.to_datetime(df.date)
df['rolling_mean'] = np.nan
df.loc[df.night,'rolling_mean'] = df.loc[df.night,'date'].apply(lambda x : df.loc[df.night & df.date.between(x-pd.DateOffset(3),x+pd.DateOffset(3)),'Value'].mean())
df.drop(['night','date'],1,inplace = True)
Time Value rolling_mean
1 2016-02-16 08:40:14.133000 12 NaN
2 2016-02-16 11:25:14.133000 4 NaN
3 2016-02-16 23:45:14.133000 8 8.000000
4 2016-03-16 08:40:14.002700 17 NaN
5 2016-03-16 23:45:14.133000 2 5.666667
6 2016-03-16 23:50:14.133000 6 5.666667
7 2016-03-16 23:55:14.133000 9 5.666667
8 2016-04-16 08:40:14.133000 10 NaN
9 2016-04-16 11:20:14.133000 2 NaN
10 2016-04-16 12:40:14.133000 7 NaN
11 2016-04-16 23:45:14.133000 5 5.000000
12 2016-05-16 08:40:14.002700 11 NaN
13 2016-05-16 23:40:14.133000 3 3.500000
14 2016-05-16 23:50:14.133000 4 3.500000
15 2016-06-16 08:40:14.002700 11 NaN
16 2016-06-16 10:30:14.002700 27 NaN
17 2016-06-16 23:25:14.133000 3 4.000000
18 2016-06-16 23:30:14.133000 5 4.000000
19 2016-07-16 08:40:14.002700 7 NaN
20 2016-07-16 11:15:14.002700 9 NaN
21 2016-07-16 23:45:14.133000 18 18.000000
It is a heavy way to do this and there is probably a more elegant and pythonic way to do it, but at least you have your output!
EDIT:
There must definitely be a better way to do it using pandas.DataFrame.rolling on a subdataframe only containing the night rows. See This question for example.
Here is a suboptimal example:
df['night'] = (df.Time.dt.hour>22) | (df.Time.dt.hour < 6)
df['date'] = df.Time.dt.date
idx = (df.night) & (df.Time.dt.hour>22)
df.loc[idx,'date'] = df.loc[idx,'date'].values + pd.DateOffset(1)
df.date = pd.to_datetime(df.date)
df = df.set_index('date').join(df.loc[df.night].set_index('date').resample("1d").Value.mean().rolling(window=3, min_periods=1).mean(),rsuffix="_rolling_mean").reset_index()
df.drop(['night','date'],1,inplace = True)
Upvotes: 1