Reputation: 381
I have a dataframe where I have transformed all NaN to 0 for a specific reason. In doing another calculation on the df, my group by is picking up a 0 and making it a value to perform the counts on. Any idea how to get python and pandas to exclude the 0 value? In this case the 0 represents a single row in the data. Is there a way to exclude all 0's from the groupby?
My groupby looks like this
+----------------+----------------+-------------+
| Team | Method | Count |
+----------------+----------------+-------------+
| Team 1 | Automated | 1 |
| Team 1 | Manual | 14 |
| Team 2 | Automated | 5 |
| Team 2 | Hybrid | 1 |
| Team 2 | Manual | 25 |
| Team 4 | 0 | 1 |
| Team 4 | Automated | 1 |
| Team 4 | Hybrid | 13 |
+----------------+----------------+-------------+
My code looks like this (after importing excel file)
df = df1.filnna(0)
a = df[['Team', 'Method']]
b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'}
Upvotes: 5
Views: 11554
Reputation: 393923
I'd filter the df prior to grouping:
In [8]:
a = df.loc[df['Method'] !=0, ['Team', 'Method']]
b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'})
b
Out[8]:
Method
Team Method
1 Automated 1
Manual 1
2 Automated 1
Hybrid 1
Manual 1
4 Automated 1
Hybrid 1
Here we only select rows where method is not equal to 0
compare against without filtering:
In [9]:
a = df[['Team', 'Method']]
b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'})
b
Out[9]:
Method
Team Method
1 Automated 1
Manual 1
2 Automated 1
Hybrid 1
Manual 1
4 0 1
Automated 1
Hybrid 1
Upvotes: 4
Reputation: 76426
You need the filter
.
The
filter
method returns a subset of the original object. Suppose we want to take only elements that belong to groups with a group sum greater than 2.
Example:
In [94]: sf = pd.Series([1, 1, 2, 3, 3, 3])
In [95]: sf.groupby(sf).filter(lambda x: x.sum() > 2) Out[95]: 3 3 4 3 5 3 dtype: int64
Upvotes: 0