Reputation: 373
I am doing an aggregation like this:
df2 = (
df1
.groupby('date', 'id', 'product')
.agg(
sf.count('new_user').alias('new_users'),
sf.count('eligible_user').alias('eligible_users')
)
)
I would like to count new_user
where eligible_user
is null. I have tried doing it like so:
df2 = (
df1
.groupby('date', 'id', 'product')
.agg(
sf.when(sf.col('eligible_user').isNull(), sf.count('new_user').alias('new_users')),
sf.count('eligible_user').alias('eligible_users')
)
)
This errors out:
Cannot resolve
eligible_user
given input columns.
I'm not sure why I get this error since eligible_user
is in df1
(first query works). I cannot simply put a .where()
for this condition prior to the groupby()
, as I want to include rows with eligible_user == 1
in the eligible_users
count.
Upvotes: 1
Views: 37
Reputation: 42392
To do a conditional count, you can use
sf.count(sf.when(sf.col('eligible_user').isNull(), sf.col('new_user')))
The count should be outside the when
, not inside.
Upvotes: 2