Reputation: 121
I working on a pandas DataFrame which needs a new column that shows count of specific values in specific columns.
I tried various combinations groupby and pivot, but had problems to apply it to whole dataframe without errors.
df = pd.DataFrame([
['a', 'z'],
['a', 'x'],
['a', 'y'],
['b', 'v'],
['b', 'x'],
['b', 'v']],
columns=['col1', 'col2'])
I need to add col3 that counts 'v' values in col2 for each value in 'col1'. There is no 'v' in col2 for 'a' in col1, so it's 0 everywhere, while expected value count is 2 for 'b', also in a row where value in col2 equals 'x' instead of 'v'.
Expected output:
['a', 'z', 0]
['a', 'x', 0]
['a', 'y', 0]
['b', 'v', 2]
['b', 'x', 2]
['b', 'v', 2]
I'm looking rather for a nice pandas specific solution because the original dataframe is quite big, so things like row iterations and time expensive.
Upvotes: 2
Views: 541
Reputation: 2757
All the answers above are fine. The only caveat is that transform
can be slow when the group size is very large. Alternatively, you can try the workaround below,
(df.assign(mask = lambda x:x.col2.eq('v'))
.pipe(lambda x:x.join(x.groupby('col1')['mask'].sum().map(int).rename('col3'),on='col1')))
Upvotes: 0
Reputation: 6669
While ALollz's answer is neat and a one liner, here is another one although a two step solution introducing you to other concepts like str.contains
and np.where
!
First get the rows which have v
using np.where
and mark them as a flag:
df['col3'] = np.where(df['col2'].str.contains('v'), 1, 0)
Now perform a groupby
on col1
and sum
them:
df['col3'] = df.groupby('col1')['col3'].transform('sum')
Output:
col1 col2 col3
0 a z 0
1 a x 0
2 a y 0
3 b v 2
4 b x 2
5 b v 2
Upvotes: 0
Reputation: 59579
Create a Boolean Series checking the equality then groupby
+transform
+ sum
to count them.
df['col3'] = df.col2.eq('v').astype(int).groupby(df.col1).transform('sum')
# col1 col2 col3
#0 a z 0
#1 a x 0
#2 a y 0
#3 b v 2
#4 b x 2
#5 b v 2
Upvotes: 1