Reputation: 121
I need to count a specific value within a multicolumn group ['a', 'b', 'c']
in a column 'u'
and apply its value to a new column [cnt_u1, cnt_u2]
no matter what value in 'u'
. This is the input df:
df = pd.DataFrame([['u1', 'a1', 'b1', 'c1'],
['u1', 'a1', 'b1', 'c1'],
['', 'a1', 'b1', 'c1'],
['', 'a1', 'b1', 'c2'],
['u2', 'a1', 'b1', 'c2'],
['', 'a1', 'b1', 'c2'],
['', 'a2', 'b1', 'c3'],
['u2', 'a2', 'b1', 'c1'],
['u2', 'a2', 'b1', 'c1'],
['u2', 'a2', 'b1', 'c1'],
['', 'a2', 'b1', 'c1'],
['u3', 'a2', 'b3', 'c2']
], columns=['u', 'a', 'b', 'c'])
Below you can find the desired output. Please note that I'm interested only is specific set of values u1, u2
, i.e. u3
is skipped.
u a b c cnt_u1 cnt_u2
0 u1 a1 b1 c1 2 0
1 u1 a1 b1 c1 2 0
2 a1 b1 c1 2 0
3 a1 b1 c2 0 1
4 u2 a1 b1 c2 0 1
5 a1 b1 c2 0 1
6 a2 b1 c3 0 0
7 u2 a2 b1 c1 3 0
8 u2 a2 b1 c1 3 0
9 u2 a2 b1 c1 3 0
10 a2 b1 c1 3 0
11 u3 a2 b3 c2 0 0
What I managed to achieve is where count should be applied:
for s in ('u1', 'u2'):
df[f'x_{s}'] = df.groupby(['a', 'b', 'c'])['u'].transform(lambda x: (x == s).any())
u a b c x_u1 x_u2
0 u1 a1 b1 c1 True False
1 u1 a1 b1 c1 True False
2 a1 b1 c1 True False
3 a1 b1 c2 False True
4 u2 a1 b1 c2 False True
5 a1 b1 c2 False True
6 a2 b1 c3 False False
7 u2 a2 b1 c1 False True
8 u2 a2 b1 c1 False True
9 u2 a2 b1 c1 False True
10 a2 b1 c1 False True
11 u3 a2 b3 c2 False False
How get the count for each group where value in 'u'
is equal to 'u1' and put into 'cnt_u1'
column (the 'u2'
respectively)? I don't claim the transform/any approach is the best, I'm open to anything that works as intended.
Upvotes: 2
Views: 60
Reputation: 863681
I believe you need sum
for count True
s values:
for s in ('u1', 'u2'):
df[f'x_{s}'] = df.groupby(['a', 'b', 'c'])['u'].transform(lambda x: (x == s).sum())
Or you can assign new column and use only sum
:
for s in ('u1', 'u2'):
df[f'x_{s}'] = df.assign(u = (df.u == s).astype(int)).groupby(['a', 'b', 'c'])['u'].transform('sum')
print (df)
u a b c x_u1 x_u2
0 u1 a1 b1 c1 2 0
1 u1 a1 b1 c1 2 0
2 a1 b1 c1 2 0
3 a1 b1 c2 0 1
4 u2 a1 b1 c2 0 1
5 a1 b1 c2 0 1
6 a2 b1 c3 0 0
7 u2 a2 b1 c1 0 3
8 u2 a2 b1 c1 0 3
9 u2 a2 b1 c1 0 3
10 a2 b1 c1 0 3
11 u3 a2 b3 c2 0 0
Upvotes: 3