Reputation: 823
Assuming that I have the following pandas dataframe, where col_1 can only take values 1.0
or 0.0
:
+-------+---------+
| score | col_a |
+-------+---------+
| 10 | 1.0 |
| 15 | 0.0 |
| 12 | 0.0 |
| 12 | 0.0 |
+-------+---------+
I would like to create the following dataframe that essentially groups by score and it then populates the counts for each score where col_a = 1.0
or col_a = 0.0
+--------+----------|---------+
| score | col_a_1 | col_a_0 |
+--------+----------+---------+
| 10 | 1 | 0 |
| 15 | 0 | 1 |
| 12 | 0 | 2 |
+--------+----------+---------+
I understand that this is a group by op, but I am not sure how to populate the counts into new columns.
Upvotes: 0
Views: 50
Reputation: 743
as your column is binary you can simply do
col_a_1 = df.groupby('score').sum()
col_a_0 = df.groupby('score').count()- col_a_1
pd.concat([col_a_0.add_suffix('_0'), col_a_1.add_suffix('_1')], axis=1)
Upvotes: 0
Reputation: 30991
Define a function counting occurrences of 0 and 1 in col_a column in the current group of rows:
def cnt(grp):
n0 = grp.col_a[grp.col_a == 0].size
n1 = grp.col_a[grp.col_a == 1].size
return pd.Series([n1, n0], index=['col_a_1', 'col_a_0'])
Then apply this function:
df.groupby('score', sort=False).apply(cnt).reset_index()
For your sample data, the result is:
score col_a_1 col_a_0
0 10 1 0
1 15 0 1
2 12 0 2
Upvotes: 2