Pandas count specific values in a column after groupby and put into a new column

Question

I need to count a specific value within a multicolumn group ['a', 'b', 'c'] in a column 'u' and apply its value to a new column [cnt_u1, cnt_u2] no matter what value in 'u'. This is the input df:

    df = pd.DataFrame([['u1', 'a1', 'b1', 'c1'],
                       ['u1', 'a1', 'b1', 'c1'],
                       ['', 'a1', 'b1', 'c1'],
                       ['', 'a1', 'b1', 'c2'],
                       ['u2', 'a1', 'b1', 'c2'],
                       ['', 'a1', 'b1', 'c2'],
                       ['', 'a2', 'b1', 'c3'],
                       ['u2', 'a2', 'b1', 'c1'],
                       ['u2', 'a2', 'b1', 'c1'],
                       ['u2', 'a2', 'b1', 'c1'],
                       ['', 'a2', 'b1', 'c1'],
                       ['u3', 'a2', 'b3', 'c2']
                       ], columns=['u', 'a', 'b', 'c'])

Below you can find the desired output. Please note that I'm interested only is specific set of values u1, u2, i.e. u3 is skipped.

     u   a   b   c cnt_u1 cnt_u2
0   u1  a1  b1  c1     2     0
1   u1  a1  b1  c1     2     0
2       a1  b1  c1     2     0
3       a1  b1  c2     0     1
4   u2  a1  b1  c2     0     1
5       a1  b1  c2     0     1
6       a2  b1  c3     0     0
7   u2  a2  b1  c1     3     0
8   u2  a2  b1  c1     3     0
9   u2  a2  b1  c1     3     0
10      a2  b1  c1     3     0
11  u3  a2  b3  c2     0     0

What I managed to achieve is where count should be applied:

for s in ('u1', 'u2'):
    df[f'x_{s}'] = df.groupby(['a', 'b', 'c'])['u'].transform(lambda x: (x == s).any())
     u   a   b   c   x_u1   x_u2
0   u1  a1  b1  c1   True  False
1   u1  a1  b1  c1   True  False
2       a1  b1  c1   True  False
3       a1  b1  c2  False   True
4   u2  a1  b1  c2  False   True
5       a1  b1  c2  False   True
6       a2  b1  c3  False  False
7   u2  a2  b1  c1  False   True
8   u2  a2  b1  c1  False   True
9   u2  a2  b1  c1  False   True
10      a2  b1  c1  False   True
11  u3  a2  b3  c2  False  False

How get the count for each group where value in 'u' is equal to 'u1' and put into 'cnt_u1' column (the 'u2' respectively)? I don't claim the transform/any approach is the best, I'm open to anything that works as intended.

jezrael · Accepted Answer

I believe you need sum for count Trues values:

for s in ('u1', 'u2'):
    df[f'x_{s}'] = df.groupby(['a', 'b', 'c'])['u'].transform(lambda x: (x == s).sum())

Or you can assign new column and use only sum:

for s in ('u1', 'u2'):
    df[f'x_{s}'] = df.assign(u = (df.u == s).astype(int)).groupby(['a', 'b', 'c'])['u'].transform('sum')

print (df)
     u   a   b   c  x_u1  x_u2
0   u1  a1  b1  c1     2     0
1   u1  a1  b1  c1     2     0
2       a1  b1  c1     2     0
3       a1  b1  c2     0     1
4   u2  a1  b1  c2     0     1
5       a1  b1  c2     0     1
6       a2  b1  c3     0     0
7   u2  a2  b1  c1     0     3
8   u2  a2  b1  c1     0     3
9   u2  a2  b1  c1     0     3
10      a2  b1  c1     0     3
11  u3  a2  b3  c2     0     0

Pandas count specific values in a column after groupby and put into a new column

Answers (1)

Related Questions