Python Pandas groupby: how to do conditional aggregation dependent on other column

Question

I would like to use Panda's groupby with multiple aggregation functions, but also including conditional statements per aggregation. Imagine having this data as an example:

df = pd.DataFrame({
    'id': ['a', 'a', 'a', 'b', 'b'],
    'type': ['in_scope', 'in_scope', 'exclude', 'in_scope', 'exclude'],
    'value': [5, 5, 99, 20, 99]
})

INPUT DATA:
| id | in_scope | value |
|----|----------|-------|
| a  | True     | 5     |
| a  | True     | 5     |
| a  | False    | 99    |
| b  | True     | 20    |
| b  | False    | 99    |

And I want to do a Pandas groupby like this:

df.groupby('id').agg(
    num_records=('id', 'size'),
    sum_value=('value', np.sum)
)

OUTPUT OF SIMPLE GROUPBY:
| id | num_records | sum_value |
|----|-------------|-----------|
| a  | 3           | 109       |
| b  | 2           | 119       |

However, I would like to do the sum depending on a condition, namely that only the "in_scope" records that are defined as True in column in_scope should be used. Note, the first aggregation should still use the entire table. In short, this is the desired output:

DESIRED OUTPUT OF GROUPBY:
| id | num_records | sum_value_in_scope |
|----|-------------|--------------------|
| a  | 3           | 10                 |
| b  | 2           | 20                 |

I was thinking about passing two arguments to a lambda function, but I do not succeed. Of course, it can be solved by performing two separate groupbys on filtered and unfiltered data and combine them together afterwards. But I was hoping there was a shorter and more elegant way.

sammywemmy · Accepted Answer

Updated answer: Create a temporary column that contains values only when type is in_scope, then aggregate:

(
    df.assign(temp=np.where(df["type"] == "in_scope", df["value"], None))
    .groupby("id", as_index=False)
    .agg(num_records=("type", "size"), sum_value=("temp", "sum"))
)
    
id   num_records    sum_value   
a       3             10
b       2             20

Python Pandas groupby: how to do conditional aggregation dependent on other column

Answers (2)

Related Questions