Filter the previous 3 rows with the same string and calculate the mean in python

Question

I have a data frame with activities and duration as columns.

duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']
activity_df = pd.DataFrame({'activities':activities, 'duration':duration})

I want to iterate through the rows and calculate the mean of the duration of the last 3 works and put it as a new feature.

Any of you guys know how to do it?

my output should be a third column where in each row the previous 3 same activities are calculated

Thank you in advance!

jezrael · Accepted Answer

Use boolean indexing, filter last 3 by tail and get mean:

a = activity_df.loc[activity_df['activities']=='work', 'duration'].tail(3).mean()

More general solution is create means by all last 3 rows of activities by GroupBy.tail:

s = activity_df.set_index('activities').groupby('activities').tail(3).mean(level=0)
print (s)

EDIT:

np.random.seed(1256)

duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']

For your new output need groupby with rolling and aggregate mean:

activity_df = pd.DataFrame({'activities':activities, 'duration':duration})

activity_df['roll'] = (activity_df.groupby('activities')['duration']
                                  .rolling(3)
                                  .mean()
                                  .reset_index(level=0, drop=True))
print (activity_df)

   activities  duration      roll
0        work         1       NaN
1        home         2       NaN
2        work         1       NaN
3        home         3       NaN
4        work         0  0.666667
5        home         1  2.000000
6        work         3  1.333333
7        home         0  1.333333
8        work         1  1.333333
9        home         3  1.333333
10       work         1  1.666667
11       home         1  1.333333
12       work         3  1.666667
13       home         2  2.000000
14       work         2  2.000000
15       home         3  2.000000
16       work         0  1.666667
17       home         2  2.333333
18       work         3  1.666667
19       home         0  1.666667
20       work         3  2.000000
21       home         0  0.666667
22       work         1  2.333333
23       home         3  1.000000
24       work         1  1.666667
25       home         2  1.666667
26       work         1  1.000000
27       home         2  2.333333
28       work         2  1.333333
29       home         1  1.666667

Filter the previous 3 rows with the same string and calculate the mean in python

Answers (2)

Related Questions