DonChilliConCarne
DonChilliConCarne

Reputation: 87

Filter the previous 3 rows with the same string and calculate the mean in python

I have a data frame with activities and duration as columns.

duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']
activity_df = pd.DataFrame({'activities':activities, 'duration':duration})

I want to iterate through the rows and calculate the mean of the duration of the last 3 works and put it as a new feature.

Any of you guys know how to do it?

my output should be a third column where in each row the previous 3 same activities are calculated

Thank you in advance!

Upvotes: 1

Views: 49

Answers (2)

jezrael
jezrael

Reputation: 862781

Use boolean indexing, filter last 3 by tail and get mean:

a = activity_df.loc[activity_df['activities']=='work', 'duration'].tail(3).mean()

More general solution is create means by all last 3 rows of activities by GroupBy.tail:

s = activity_df.set_index('activities').groupby('activities').tail(3).mean(level=0)
print (s)

EDIT:

np.random.seed(1256)

duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']

For your new output need groupby with rolling and aggregate mean:

activity_df = pd.DataFrame({'activities':activities, 'duration':duration})

activity_df['roll'] = (activity_df.groupby('activities')['duration']
                                  .rolling(3)
                                  .mean()
                                  .reset_index(level=0, drop=True))
print (activity_df)
   activities  duration      roll
0        work         1       NaN
1        home         2       NaN
2        work         1       NaN
3        home         3       NaN
4        work         0  0.666667
5        home         1  2.000000
6        work         3  1.333333
7        home         0  1.333333
8        work         1  1.333333
9        home         3  1.333333
10       work         1  1.666667
11       home         1  1.333333
12       work         3  1.666667
13       home         2  2.000000
14       work         2  2.000000
15       home         3  2.000000
16       work         0  1.666667
17       home         2  2.333333
18       work         3  1.666667
19       home         0  1.666667
20       work         3  2.000000
21       home         0  0.666667
22       work         1  2.333333
23       home         3  1.000000
24       work         1  1.666667
25       home         2  1.666667
26       work         1  1.000000
27       home         2  2.333333
28       work         2  1.333333
29       home         1  1.666667

Upvotes: 1

harpan
harpan

Reputation: 8631

You need to look for work activity and then take the mean of last three rows.

activity_df.loc[activity_df['activities']=='work'][-3:]['duration'].mean()

EDIT

Seems like you need to use .rolling() with window=3 on grouped data to get the mean of previous 3 row per activities.

activity_df['rolling_mean'] = activity_df.groupby('activities').rolling(3).mean().reset_index(0, drop=True)

Output:

    activities  duration  rolling_mean
0        work         3           NaN
1        home         1           NaN
2        work         3           NaN
3        home         0           NaN
4        work         1      2.333333
5        home         1      0.666667
6        work         3      2.333333
7        home         3      1.333333
8        work         2      2.000000
9        home         0      1.333333
10       work         2      2.333333
11       home         0      1.000000
12       work         0      1.333333
13       home         1      0.333333
14       work         0      0.666667
15       home         3      1.333333
16       work         3      1.000000
17       home         3      2.333333
18       work         0      1.000000
19       home         3      3.000000
20       work         0      1.000000
21       home         3      3.000000
22       work         1      0.333333
23       home         3      3.000000
24       work         3      1.333333
25       home         3      3.000000
26       work         0      1.333333
27       home         3      3.000000
28       work         3      2.000000
29       home         3      3.000000

Upvotes: 1

Related Questions