flpn
flpn

Reputation: 1978

Create column based on complex condition in pyspark

I'm "translating" a series of operations that were in Tableau into a pyspark script. One column in specific I'm having serious trouble to replicate, here the Tableu version of the code:

min(
     min(
          IF[COLUMN_A] > 0 THEN [COLUMN_B] ELSE 9999 END,
          IF[COLUMN_C] > 0 THEN [COLUMN_D] ELSE 99999 END),
     min(
          IF[COLUMN_E] > 0 THEN [COLUMN_F] ELSE 99999 END,
          IF[COLUMN_G] != 0 THEN [COLUMN_H] ELSE 99999 END)
)

Is it possible to create a version of that code using a when and otherwise functions in pyspark? I'm trying but it seems I can't solve this...

Upvotes: 1

Views: 175

Answers (1)

andrew
andrew

Reputation: 4089

PySpark definitely supports chaining complex conditions. I suspect your point of failure is in the syntax, or not using fun.lit to wrap the 9999 scalar.

Anyway, assuming your columns are called 'A', 'B', ... 'H', the following should work:

import pyspark.sql.functions as fun
...
fun.least(
    fun.least(
        fun.when(fun.col('A') > 0, fun.col('B')).otherwise(fun.lit(9999)),
        fun.when(fun.col('C') > 0, fun.col('D')).otherwise(fun.lit(9999))
            ),
    fun.least(
        fun.when(fun.col('E') > 0, fun.col('F')).otherwise(fun.lit(9999)),
        fun.when(fun.col('G') != 0, fun.col('H')).otherwise(fun.lit(9999))
            )
        )

Upvotes: 1

Related Questions