Danish
Danish

Reputation: 2871

Groupby count based on value of other column in pandas

I have a data frame as shown below

           B_ID  no_show Session  slot_num  
0           1      0.4      S1         1               
1           2      0.3      S1         2        
2           3      0.8      S1         3        
3     walkin1      0.0      S1         3        
4           4      0.3      S1         4        
5     walkin2      0.0      S1         4        
6           5      0.6      S1         5        
7           6      0.8      S1         6        
8           7      0.9      S1         7        
9   overbook1      0.5      S1         7        
10          8      0.4      S1         8        
11    walkin3      0.0      S1         8        
12          9      0.6      S1         9        
13  overbook2      0.5      S1         9        
14         12      0.9      S2         1        
15         13      0.5      S2         2        
16    walkin1      0.0      S2         2        
17         14      0.3      S2         3        
18         15      0.7      S2         4        
19         20      0.7      S2         5        
20  overbook1      0.5      S2         5        
21         16      0.6      S2         6        
22    walkin2      0.0      S2         6        
23  overbook2      0.5      S2         6        
24         17      0.8      S2         7        
25         19      0.3      S2         8        
26    walkin3      0.0      S2         8
27         29      0.5      S3         1        
28         07      0.8      S3         2        
29    walkin1      0.3      S3         2        
30         06      0.1      S3         3 

From the above, I would like count the number of walkin and overbook for each session.

Expected Output

Session    total_slots      num_of_walkin       num_of_overbook
S1             9                3                    2
S2             8                3                    2
S3             3                1                    0

Upvotes: 0

Views: 86

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

You can used named aggregation. This only works on pandas 0.25+.

(
    df.groupby('Session')
    .agg(no_show=('slot_num', lambda x: x.tail(1)),
         num_of_walkin=('B_ID', lambda x: x.str.startswith('walkin').sum()),
         num_of_overbook=('B_ID', lambda x: x.str.startswith('overbook').sum())
        )
    .reset_index()
)

    Session no_show num_of_walkin   num_of_overbook
0   S1      9       3               2
1   S2      8       3               2
2   S3      3       1               0

Upvotes: 3

jezrael
jezrael

Reputation: 862406

For improve perfromance I suggest not filtering per groups, but in prepriocessing to helper columns and then aggregate only sum and fort last value is used GroupBy.last:

df1 = (df.assign(b = df['B_ID'].str.startswith('walkin').astype(int),
                 c = df['B_ID'].str.startswith('overbook').astype(int))
        .groupby('Session')
        .agg(no_show=('slot_num', 'last'),
             num_of_walkin=('b', 'sum'),
             num_of_overbook=('c', 'sum')
        )
    .reset_index())
print (df1)
  Session  no_show  num_of_walkin  num_of_overbook
0      S1        9              3                2
1      S2        8              3                2
2      S3        3              1                0

Upvotes: 1

Related Questions