Reputation: 75
Once I try my code, I get the following error from the console:
AnalysisException: cannot resolve '
1 as identified
' given input columns: [LPay_user, spark_catalog.nn_TEAM_es.fact_table.customer_id, identified, HELLO_pay_date, spark_catalog.nn_TEAM_es.fact_table.ticket_id];;
Here is the code used:
start = '2020-10-20'
end = '2021-01-20'
identified = (spark.table(f'nn_TEAM_{country}.fact_table')
.filter(f.col('date_key').between(start,end))
.filter(f.col('is_HELLO_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='trx')
.filter(f.col('is_trx_ok')==1)
.withColumn('week', f.date_format(f.date_sub(f.col('date_key'), 1), 'YYYY-ww'))
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'M'))
.selectExpr('customer_id','ticket_id','1 as identified')
)
output = (identified
.join(dim_customers,'customer_id','left')
.withColumn('segment_group',
f.when((f.col('HEL_user')==1),'HELLO_user')
.when((f.col('HEL_user')==0),'NO_HELLO_user')
.when((f.col('1 as identified').isNull()) & (f.col('HELLO_user')==1),'HELLO_user_no_identified')
)
.groupby('segment_group')
.agg(
f.countDistinct('customer_id').alias('customers'),
f.countDistinct('ticket_id').alias('tickets')
))
As you can see the columns "1 as identified" is selected. So I don't understand why I'm getting this error.
What I'm trying to do is create a segmentation for customers based in the columns "1 as identified" and "HEL_user".
Can someone explain me how to fix this error? Thanks!
Upvotes: 0
Views: 48
Reputation: 42352
When you select 1 as identified
you're creating a new column called identified
which contains all 1
. When you want to select this column in the future, you should select identified
because the column is called identified
, not called 1 as identified
.
Upvotes: 1