Reputation: 1978
I'm "translating" a series of operations that were in Tableau into a pyspark script. One column in specific I'm having serious trouble to replicate, here the Tableu version of the code:
min(
min(
IF[COLUMN_A] > 0 THEN [COLUMN_B] ELSE 9999 END,
IF[COLUMN_C] > 0 THEN [COLUMN_D] ELSE 99999 END),
min(
IF[COLUMN_E] > 0 THEN [COLUMN_F] ELSE 99999 END,
IF[COLUMN_G] != 0 THEN [COLUMN_H] ELSE 99999 END)
)
Is it possible to create a version of that code using a when
and otherwise
functions in pyspark? I'm trying but it seems I can't solve this...
Upvotes: 1
Views: 175
Reputation: 4089
PySpark definitely supports chaining complex conditions. I suspect your point of failure is in the syntax, or not using fun.lit
to wrap the 9999
scalar.
Anyway, assuming your columns are called 'A', 'B', ... 'H', the following should work:
import pyspark.sql.functions as fun
...
fun.least(
fun.least(
fun.when(fun.col('A') > 0, fun.col('B')).otherwise(fun.lit(9999)),
fun.when(fun.col('C') > 0, fun.col('D')).otherwise(fun.lit(9999))
),
fun.least(
fun.when(fun.col('E') > 0, fun.col('F')).otherwise(fun.lit(9999)),
fun.when(fun.col('G') != 0, fun.col('H')).otherwise(fun.lit(9999))
)
)
Upvotes: 1