Conditional aggregation using when()

Question

I am doing an aggregation like this:

df2 = (
    df1
    .groupby('date', 'id', 'product')
    .agg(
        sf.count('new_user').alias('new_users'),
        sf.count('eligible_user').alias('eligible_users')
    
    )
)

I would like to count new_user where eligible_user is null. I have tried doing it like so:

df2 = (
    df1
    .groupby('date', 'id', 'product')
    .agg(
        sf.when(sf.col('eligible_user').isNull(), sf.count('new_user').alias('new_users')),
        sf.count('eligible_user').alias('eligible_users')       
    )
)

This errors out:

Cannot resolve eligible_user given input columns.

I'm not sure why I get this error since eligible_user is in df1 (first query works). I cannot simply put a .where() for this condition prior to the groupby(), as I want to include rows with eligible_user == 1 in the eligible_users count.

mck · Accepted Answer

To do a conditional count, you can use

sf.count(sf.when(sf.col('eligible_user').isNull(), sf.col('new_user')))

The count should be outside the when, not inside.

Conditional aggregation using when()

Answers (1)

Related Questions