Reputation: 87
I have a data frame with activities and duration as columns.
duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']
activity_df = pd.DataFrame({'activities':activities, 'duration':duration})
I want to iterate through the rows and calculate the mean of the duration of the last 3 works and put it as a new feature.
Any of you guys know how to do it?
my output should be a third column where in each row the previous 3 same activities are calculated
Thank you in advance!
Upvotes: 1
Views: 49
Reputation: 862781
Use boolean indexing
, filter last 3 by tail
and get mean
:
a = activity_df.loc[activity_df['activities']=='work', 'duration'].tail(3).mean()
More general solution is create mean
s by all last 3 rows of activities
by GroupBy.tail
:
s = activity_df.set_index('activities').groupby('activities').tail(3).mean(level=0)
print (s)
EDIT:
np.random.seed(1256)
duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']
For your new output need groupby
with rolling
and aggregate mean
:
activity_df = pd.DataFrame({'activities':activities, 'duration':duration})
activity_df['roll'] = (activity_df.groupby('activities')['duration']
.rolling(3)
.mean()
.reset_index(level=0, drop=True))
print (activity_df)
activities duration roll
0 work 1 NaN
1 home 2 NaN
2 work 1 NaN
3 home 3 NaN
4 work 0 0.666667
5 home 1 2.000000
6 work 3 1.333333
7 home 0 1.333333
8 work 1 1.333333
9 home 3 1.333333
10 work 1 1.666667
11 home 1 1.333333
12 work 3 1.666667
13 home 2 2.000000
14 work 2 2.000000
15 home 3 2.000000
16 work 0 1.666667
17 home 2 2.333333
18 work 3 1.666667
19 home 0 1.666667
20 work 3 2.000000
21 home 0 0.666667
22 work 1 2.333333
23 home 3 1.000000
24 work 1 1.666667
25 home 2 1.666667
26 work 1 1.000000
27 home 2 2.333333
28 work 2 1.333333
29 home 1 1.666667
Upvotes: 1
Reputation: 8631
You need to look for work
activity and then take the mean of last three rows.
activity_df.loc[activity_df['activities']=='work'][-3:]['duration'].mean()
EDIT
Seems like you need to use .rolling() with window=3
on grouped data to get the mean of previous 3 row per activities
.
activity_df['rolling_mean'] = activity_df.groupby('activities').rolling(3).mean().reset_index(0, drop=True)
Output:
activities duration rolling_mean
0 work 3 NaN
1 home 1 NaN
2 work 3 NaN
3 home 0 NaN
4 work 1 2.333333
5 home 1 0.666667
6 work 3 2.333333
7 home 3 1.333333
8 work 2 2.000000
9 home 0 1.333333
10 work 2 2.333333
11 home 0 1.000000
12 work 0 1.333333
13 home 1 0.333333
14 work 0 0.666667
15 home 3 1.333333
16 work 3 1.000000
17 home 3 2.333333
18 work 0 1.000000
19 home 3 3.000000
20 work 0 1.000000
21 home 3 3.000000
22 work 1 0.333333
23 home 3 3.000000
24 work 3 1.333333
25 home 3 3.000000
26 work 0 1.333333
27 home 3 3.000000
28 work 3 2.000000
29 home 3 3.000000
Upvotes: 1