Reputation: 4807
I have a pandas dataframe df
as:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
I need to get the following df2
from above:
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 3.24 Saturday
1/6/2019 3.24 Sunday
1/7/2019 3.24 Monday
1/12/2019 2.78 Saturday
1/13/2019 2.78 Sunday
1/14/2019 2.78 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
Where the df2 values are updated to have average of consecutive Sat, Sun and Mon values.
i.e. average of 2.95, 3.39, 3.39
for dates 1/5/2019, 1/6/2019, 1/7/2019
in df is 3.24 and hence in df2 I have replaced the 1/5/2019, 1/6/2019, 1/7/2019
values with 3.24.
The trick has been finding the consecutive Saturday, Sunday and Monday. Not sure how to approach this.
Upvotes: 3
Views: 360
Reputation: 14103
You can use CustomBusinessDay
with pd.grouper
to create a group col:
# if you want to only find the mean if all three days are found
from pandas.tseries.offsets import CustomBusinessDay
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df.update(df[df.groupby('group_col')['Val'].transform('size').eq(3)].groupby('group_col').transform('mean'))
Date Val WD group_col
0 2019-01-03 2.650000 Thursday 0
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 2
3 2019-01-06 3.243333 Sunday 2
4 2019-01-07 3.243333 Monday 2
5 2019-01-12 2.783333 Saturday 7
6 2019-01-13 2.783333 Sunday 7
7 2019-01-14 2.783333 Monday 7
8 2019-01-15 3.810000 Tuesday 8
9 2019-01-16 3.750000 Wednesday 9
10 2019-01-17 3.690000 Thursday 10
11 2019-01-18 3.470000 Friday 11
or if you want to find the mean of any combination of sat sun mon in the same week
days = CustomBusinessDay(weekmask='Tue Wed Thu Fri Sat')
df['group_col'] = df.groupby(pd.Grouper(key='Date', freq=days)).ngroup()
df['Val'] = df.groupby('group_col')['Val'].transform('mean')
Upvotes: 1
Reputation: 164673
One approach is to calculate a week number, then use groupby
to calculate means across specific days and map this back to your original dataframe.
df['Date'] = pd.to_datetime(df['Date'])
# consider Monday to belong to previous week
week, weekday = df['Date'].dt.week, df['Date'].dt.weekday
df['Week'] = np.where(weekday.eq(0), week - 1, week)
# take means of Fri, Sat, Sun, then map back
mask = weekday.isin([5, 6, 0])
week_val_map = df[mask].groupby('Week')['Val'].mean()
df.loc[mask, 'Val'] = df['Week'].map(week_val_map)
print(df)
Date Val WD Week
0 2019-01-03 2.650000 Thursday 1
1 2019-01-04 2.510000 Friday 1
2 2019-01-05 3.243333 Saturday 1
3 2019-01-06 3.243333 Sunday 1
4 2019-01-07 3.243333 Monday 1
5 2019-01-12 2.783333 Saturday 2
6 2019-01-13 2.783333 Sunday 2
7 2019-01-14 2.783333 Monday 2
8 2019-01-15 3.810000 Tuesday 3
9 2019-01-16 3.750000 Wednesday 3
10 2019-01-17 3.690000 Thursday 3
11 2019-01-18 3.470000 Friday 3
Upvotes: 0
Reputation: 59549
This logic creates a Series
that assigns a unique ID to groups of consecutive Sat/Sun/Mon rows in your DataFrame
. Then ensure there are 3 of them (not just Sat/Sun or Sun/Mon), and transform
those values with the mean:
import pandas as pd
#df['Date'] = pd.to_datetime(df.Date)
s = (~(df.Date.dt.dayofweek.isin([0,6])
& (df.Date - df.Date.shift(1)).dt.days.eq(1))).cumsum()
to_trans = s[s.groupby(s).transform('size').eq(3)]
df.loc[to_trans.index, 'Val'] = df.loc[to_trans.index].groupby(to_trans).Val.transform('mean')
Date Val WD
0 2019-01-03 2.650000 Thursday
1 2019-01-04 2.510000 Friday
2 2019-01-05 3.243333 Saturday
3 2019-01-06 3.243333 Sunday
4 2019-01-07 3.243333 Monday
5 2019-01-12 2.783333 Saturday
6 2019-01-13 2.783333 Sunday
7 2019-01-14 2.783333 Monday
8 2019-01-15 3.810000 Tuesday
9 2019-01-16 3.750000 Wednesday
10 2019-01-17 3.690000 Thursday
11 2019-01-18 3.470000 Friday
12 2019-01-19 3.250000 Saturday
13 2019-01-20 3.250000 Sunday
14 2019-01-21 3.250000 Monday
15 2019-01-22 5.000000 Tuesday
16 2019-01-27 2.000000 Sunday
17 2019-01-28 4.000000 Monday
18 2019-01-29 6.000000 Tuesday
19 2019-02-05 7.000000 Tuesday
20 2019-02-07 6.000000 Thursday
21 2019-02-12 9.000000 Tuesday
Date Val WD
1/3/2019 2.65 Thursday
1/4/2019 2.51 Friday
1/5/2019 2.95 Saturday
1/6/2019 3.39 Sunday
1/7/2019 3.39 Monday
1/12/2019 2.23 Saturday
1/13/2019 2.50 Sunday
1/14/2019 3.62 Monday
1/15/2019 3.81 Tuesday
1/16/2019 3.75 Wednesday
1/17/2019 3.69 Thursday
1/18/2019 3.47 Friday
1/19/2019 3.75 Saturday
1/20/2019 2.00 Sunday
1/21/2019 4.00 Monday
1/22/2019 5.00 Tuesday
1/27/2019 2.00 Sunday
1/28/2019 4.00 Monday
1/29/2019 6.00 Tuesday
2/5/2019 7.00 Tuesday
2/7/2019 6.00 Thursday
2/12/2019 9.00 Tuesday
Upvotes: 0