Reputation: 134
I want to run a query on my Dataframe with pandas to count the number of times a 0 is in 'Days' column, not null in 'SUPPLY_CNT', AND groupby 'id' column.
Example Dataframe
ID | Days | SUPPLY_CNT |
-----------------------------|
| 1561 | -11 | 15 |
| 1561 | 0 | 05 |
| 1561 | 44 | 11 |
| 1561 | 0 | |
| 1561 | 0 | 24 |
| 1561 | 56 | 24 |
| 1561 | 0 | 19 |
| 1561 | 92 | 21 |
| 2412 | -789 | 09 |
| 2412 | -456 | 09 |
| 2412 | -321 | 31 |
| 2412 | 0 | |
| 2412 | 99 | 32 |
| 2412 | 0 | 14 |
| 2412 | 0 | 18 |
| 7848 | 451 | 11 |
| 7848 | 3222| 21 |
| 7848 | 0 | 12 |
------------------------------
Expected Output:
1561 3
2412 2
7848 1
I have this query but need to groupby 'id'
(df.loc[(df['Days'] == 0) & (df['PAY_DAY_SUPPLY_CNT'].notnull())])
R Equivalent:
filter(Days==0 & !is.na(PAY_DAY_SUPPLY_CNT))%>%
group_by(id)%>%
count(Days)%>%
arrange(desc(n))
Upvotes: 1
Views: 139
Reputation: 8033
Here we first take rows in which the Days
value is 0
AND SUPPLY_CNT
is a number. Then we do groupby on that.
df.loc[(df['Days'] == 0) & ((df['SUPPLY_CNT'].notna()))].groupby('ID')['Days'].count()
Output
ID
1561 3
2412 2
7848 1
Upvotes: 0
Reputation: 863166
Create mask for test 0
values and not missing and convert to integer
s:
m = ((df['Days'] == 0) & df['SUPPLY_CNT'].notna()).astype(int)
And for count aggregate sum
:
df1 = m.groupby(df['ID']).sum().reset_index(name='count')
print (df1)
ID count
0 1561 3
1 2412 2
2 7848 1
For Series
:
s = m.groupby(df['ID']).sum()
print (s)
ID
1561 3
2412 2
7848 1
dtype: int32
Upvotes: 1
Reputation: 306
I think this will help you out.
df[df['Days']==0].groupby('ID')[['Days']].count().rename(columns = {"Days": "Count"})
Upvotes: 0