Reputation: 8055
i am trying to do a groupby on the id column
such that i can show the number of rows in col1 that is equal to 1.
df:
id col1 col2 col3
a 1 1 1
a 0 1 1
a 1 1 1
b 1 0 1
my code:
df.groupby(['id'])[col1].count()[1]
output i got was 2. It didnt show me the values from other ids
like b.
i want:
id col1
a 2
b 1
if possible can the total rows per id also be displayed as a new column?
example:
id col1 total
a 2 3
b 1 1
Upvotes: 1
Views: 5103
Reputation: 62037
If you want to generalize the solution to include values in col1 that are not zero you can do the following. This also orders the columns correctly.
df.set_index('id')['col1'].eq(1).groupby(level=0).agg([('col1', 'sum'), ('total', 'count')]).reset_index()
id col1 total
0 a 2.0 3
1 b 1.0 1
Using a tuple in the agg
method where the first value is the column name and the second the aggregating function is new to me. I was just experimenting and it seemed to work. I don't remember seeing it in the documentation so use with caution.
Upvotes: 0
Reputation: 215127
Assuming you have only 1 and 0 in col1, you can use agg
:
df.groupby('id', as_index=False)['col1'].agg({'col1': 'sum', 'total': 'count'})
# id total col1
#0 a 3 2
#1 b 1 1
Upvotes: 1
Reputation: 389
It's because your rows which id is 'a' sums to 3. The 2 of them are identical that's why it's been grouped and considered as one then it added the unique row which contains the 0 value on its col 1. You can't group rows with different values on its rows.
Yes you can add it on your output. Just place a method how you counted all rows on your column section of your code.
Upvotes: 0