Reputation: 99
I have the following dataset:
df = pd.DataFrame(['BAT', 'BWL', 'ALLR'])
df['1'] = ['BWL', 'BAT', 'ALLR']
df['2'] = ['BAT', 'BAT', 'WK']
df
0 1 2
0 BAT BWL BAT
1 BWL BAT BAT
2 ALLR ALLR WK
How to add columns named 'BAT','BWL','ALL', and 'WK' with the count the number of BAT/BWL/ALLR/WK appearing as the value of cells for all the rows? I have over 100k rows hence want to use Pandas
The final table should look like the following.
0 1 2 BAT BWL ALLR WK
0 BAT BWL BAT 2 1 0 0
1 BWL BAT BAT 2 1 0 0
2 ALLR ALLR WK 0 0 2 1
Upvotes: 1
Views: 60
Reputation: 323336
Try with stack
+get_dummies
df=df.join(df[['colyour need']].stack().str.get_dummies().sum(level=0))
df
Out[100]:
0 1 2 ALLR BAT BWL WK
0 BAT BWL BAT 0 2 1 0
1 BWL BAT BAT 0 2 1 0
2 ALLR ALLR WK 2 0 0 1
Upvotes: 3
Reputation: 75110
Similar to YOBEN_S's answer , with pd.get_dummies
with groupby
on axis=1
and sum
:
df.join(pd.get_dummies(df).groupby(lambda x: x.split('_')[1],axis=1).sum())
0 1 2 ALLR BAT BWL WK
0 BAT BWL BAT 0 2 1 0
1 BWL BAT BAT 0 2 1 0
2 ALLR ALLR WK 2 0 0 1
Upvotes: 1