Reputation: 75
I want to get the total number of registered users and the total number of identified users,by month, from 2 tables that I join. See desired output:
Month reg_users iden_users
Jan 300 600
Feb 250 500
Mar 100 200
But I got an error:
when() missing 1 required positional argument: 'value'
Code used:
#registered vs identified
dim_customers = (spark.table(f'nn_squad7_{country}.dim_customers')
.filter(f.col('registration_date').between(start,end))
.withColumn('month', f.date_format(f.date_sub(f.col('registration_date'), 1), 'MMM'))
.selectExpr('customer_id','age','gender','registration_date','month','1 as registered')
)
df = (
spark.table(f'nn_squad7_{country}.fact_table')
.filter(f.col('date_key').between(start,end))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='trx')
.filter(f.col('is_trx_ok') == 1)
.withColumn('week', f.date_format(f.date_sub(f.col('date_key'), 1), 'YYYY-ww'))
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'MMM'))
.selectExpr('customer_id','1 as identified','date_key')
)
output2 = (dim_customers
.join(df,'customer_id','left')
.fillna(0, subset=['identified'])
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'MMM'))
.groupby('month')
.agg(f.countDistinct('customer_id').alias('reg_users'),
)
.withColumn('iden_users',f.when((f.col('identified')==1)))
)
display(output2)
Any idea why I'm getting this error? A solution could be make 2 queries? My idea is to join tables and do it all together in one single query.
Upvotes: 0
Views: 48
Reputation: 42352
I guess you wanted to get the distinct count of customer ids where identified = 1
. You can do a conditional count during the aggregation using when
:
output2 = (dim_customers
.join(df,'customer_id','left')
.fillna(0, subset=['identified'])
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'MMM'))
.groupby('month')
.agg(f.countDistinct('customer_id').alias('reg_users'),
f.countDistinct(
f.when(
(f.col('identified')==1),
f.col('customer_id')
)
).alias('iden_users')
)
)
Upvotes: 1