DataNoob
DataNoob

Reputation: 381

How to exclude a single value from Groupby method using Pandas

I have a dataframe where I have transformed all NaN to 0 for a specific reason. In doing another calculation on the df, my group by is picking up a 0 and making it a value to perform the counts on. Any idea how to get python and pandas to exclude the 0 value? In this case the 0 represents a single row in the data. Is there a way to exclude all 0's from the groupby?

My groupby looks like this

  +----------------+----------------+-------------+
  | Team           | Method         |  Count      |
  +----------------+----------------+-------------+
  | Team 1         | Automated      |           1 |
  | Team 1         | Manual         |          14 |
  | Team 2         | Automated      |           5 |
  | Team 2         | Hybrid         |           1 |
  | Team 2         | Manual         |          25 |
  | Team 4         | 0              |           1 |
  | Team 4         | Automated      |           1 |
  | Team 4         | Hybrid         |          13 |
  +----------------+----------------+-------------+

My code looks like this (after importing excel file)

  df = df1.filnna(0)
  a = df[['Team', 'Method']]
  b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'}

Upvotes: 5

Views: 11554

Answers (2)

EdChum
EdChum

Reputation: 393923

I'd filter the df prior to grouping:

In [8]:
a = df.loc[df['Method'] !=0, ['Team', 'Method']]
b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'})
b
Out[8]:
                Method
Team Method           
1    Automated       1
     Manual          1
2    Automated       1
     Hybrid          1
     Manual          1
4    Automated       1
     Hybrid          1

Here we only select rows where method is not equal to 0

compare against without filtering:

In [9]:
a = df[['Team', 'Method']]
b = a.groupby(['Team', 'Method']).agg({'Method' : 'count'})
b

Out[9]:
                Method
Team Method           
1    Automated       1
     Manual          1
2    Automated       1
     Hybrid          1
     Manual          1
4    0               1
     Automated       1
     Hybrid          1

Upvotes: 4

Lajos Arpad
Lajos Arpad

Reputation: 76426

You need the filter.

The filter method returns a subset of the original object. Suppose we want to take only elements that belong to groups with a group sum greater than 2.

Example:

In [94]: sf = pd.Series([1, 1, 2, 3, 3, 3])

In [95]: sf.groupby(sf).filter(lambda x: x.sum() > 2) Out[95]: 3 3 4 3 5 3 dtype: int64

Source.

Upvotes: 0

Related Questions