How to assign values to more than one column in spark sql case/when statement

Question

I have massive case condition. I am deriving 2 fields from the case statement. I can write the 2 case statements (complete same logic) to derive each field values separately. May I know is there any easy to way to take care of this situation? Note in Sas both can be derived in one code base.

Deriving 2 fields:[apps_per_assoc_range,sortvar3]

Ex:

assoc_rangeExpr='''
CASE WHEN avg_apps_per_month<=avg_apps_per_month_per_assoc_str THEN 'LT or EQUAL to Store Average' \
     WHEN (avg_apps_per_month > avg_apps_per_month_per_assoc_str) and (avg_apps_per_month<= 2*avg_apps_per_month_per_assoc_str )THEN '2X Store Average' \
     Continues for 30 lines
    else 'NULL' END as apps_per_assoc_range '''

sortvar3Expr='''
CASE WHEN avg_apps_per_month<=avg_apps_per_month_per_assoc_str THEN 1 \
     WHEN (avg_apps_per_month > avg_apps_per_month_per_assoc_str) and (avg_apps_per_month<= 2*avg_apps_per_month_per_assoc_str )THEN 2 \
     Continues for 30 lines
    else 0 END as sortvar3 '''

Regards Sanjeeb

Ridwan · Accepted Answer

you could use struct. here is an example:

import spark.implicits._
    val df = Seq( ("a",1),("b",2),("b",1)
              ).toDF("type","count")
              .withColumn("description", expr("case when count> 0 then struct('gain' as scale, 'positive' as category) else struct('loss' as scale,'negative' as category) end"))
              .select(col("type"), col("count"), col("description.scale").as("scale"), col("description.category").as("category"));
    df.show()

+----+-----+-----+--------+
|type|count|scale|category|
+----+-----+-----+--------+
|   a|    1| gain|positive|
|   b|    2| gain|positive|
|   b|    1| gain|positive|
+----+-----+-----+--------+

How to assign values to more than one column in spark sql case/when statement

Answers (1)

Related Questions