Petr Petrov
Petr Petrov

Reputation: 4452

Pandas: groupby

I have dataframe

df = pd.DataFrame({'member_id': [111, 111, 111, 111, 222, 222, 333, 333], 'event_duration': [12, 242, 3, 21, 4, 76, 34, 12], 'period': [1, 2, 2, 2, 3, 3, 4, 4]})

   event_duration  member_id  period
0              12        111       1
1             242        111       2
2               3        111       2
3              21        111       2
4               4        222       3
5              76        222       3
6              34        333       4
7              12        333       4

I need to count number of period to every member_id and median of periods

I use

res = df.groupby(['member_id']).agg({'period': pd.Series.nunique, 'event_duration': np.median}).reset_index()

But it print median to all period. But I need, for example for 111 get median to 1 and 2 period, [12, 266], how can I do that?

Upvotes: 2

Views: 2116

Answers (2)

christinabo
christinabo

Reputation: 1130

As far as I understand, you need to group by member_id and then by period in order to get the different values for the event_duration per period for each member_id.

If this is the case, I would do:

res = df.groupby(['member_id', 'period']).sum()

This prints:

                  event_duration
member_id period                
111       1                   12
          2                  266
222       3                   80
333       4                   46

Then, you group again by member_id and you get the mean of the event_duration:

res2 = res.groupby(['member_id']).mean()

This prints:

           event_duration
member_id                
111                   139
222                    80
333                    46

I hope that this is the result you want to achieve.

Upvotes: 1

zipa
zipa

Reputation: 27899

Could this be what you really need:

df.groupby(['member_id', 'period'], as_index=False)['event_duration'].sum().groupby(['member_id'], as_index=False).agg({'period': pd.Series.nunique, 'event_duration': np.median})

   member_id  event_duration  period
0        111             139       2
1        222              80       1
2        333              46       1

Upvotes: 1

Related Questions