Reputation: 1518
I have massive case condition. I am deriving 2 fields from the case statement. I can write the 2 case statements (complete same logic) to derive each field values separately. May I know is there any easy to way to take care of this situation? Note in Sas both can be derived in one code base.
Deriving 2 fields:[apps_per_assoc_range,sortvar3]
Ex:
assoc_rangeExpr='''
CASE WHEN avg_apps_per_month<=avg_apps_per_month_per_assoc_str THEN 'LT or EQUAL to Store Average' \
WHEN (avg_apps_per_month > avg_apps_per_month_per_assoc_str) and (avg_apps_per_month<= 2*avg_apps_per_month_per_assoc_str )THEN '2X Store Average' \
Continues for 30 lines
else 'NULL' END as apps_per_assoc_range '''
sortvar3Expr='''
CASE WHEN avg_apps_per_month<=avg_apps_per_month_per_assoc_str THEN 1 \
WHEN (avg_apps_per_month > avg_apps_per_month_per_assoc_str) and (avg_apps_per_month<= 2*avg_apps_per_month_per_assoc_str )THEN 2 \
Continues for 30 lines
else 0 END as sortvar3 '''
Regards Sanjeeb
Upvotes: 1
Views: 605
Reputation: 373
you could use struct. here is an example:
import spark.implicits._
val df = Seq( ("a",1),("b",2),("b",1)
).toDF("type","count")
.withColumn("description", expr("case when count> 0 then struct('gain' as scale, 'positive' as category) else struct('loss' as scale,'negative' as category) end"))
.select(col("type"), col("count"), col("description.scale").as("scale"), col("description.category").as("category"));
df.show()
+----+-----+-----+--------+ |type|count|scale|category| +----+-----+-----+--------+ | a| 1| gain|positive| | b| 2| gain|positive| | b| 1| gain|positive| +----+-----+-----+--------+
Upvotes: 2