Sreedhar
Sreedhar

Reputation: 30035

PySpark DataFrame withColumn multiple when conditions

How can i achieve below with multiple when conditions.

from pyspark.sql import functions as F
df = spark.createDataFrame([(5000, 'US'),(2500, 'IN'),(4500, 'AU'),(4500, 'NZ')],["Sales", "Region"])
df.withColumn('Commision', 
              F.when(F.col('Region')=='US',F.col('Sales')*0.05).\
              F.when(F.col('Region')=='IN',F.col('Sales')*0.04).\
              F.when(F.col('Region')in ('AU','NZ'),F.col('Sales')*0.04).\
              otherwise(F.col('Sales'))).show()

Upvotes: 3

Views: 14326

Answers (2)

notNull
notNull

Reputation: 31510

I think you are missing .isin in when condition and Use only F.when for first when condition only (or) use .when.

from pyspark.sql import functions as F
df = spark.createDataFrame([(5000, 'US'),(2500, 'IN'),(4500, 'AU'),(4500, 'NZ')],["Sales", "Region"])
df.withColumn('Commision', 
              F.when(F.col('Region')=='US',F.col('Sales')*0.05).\
              when(F.col('Region')=='IN',F.col('Sales')*0.04).\
              when(F.col('Region').isin ('AU','NZ'),F.col('Sales')*0.04).\
              otherwise(F.col('Sales'))).show()

#+-----+------+---------+
#|Sales|Region|Commision|
#+-----+------+---------+
#| 5000|    US|    250.0|
#| 2500|    IN|    100.0|
#| 4500|    AU|    180.0|
#| 4500|    NZ|    180.0|
#+-----+------+---------+

Upvotes: 2

akuiper
akuiper

Reputation: 215057

Use otherwise after when:

df.withColumn('Commision',
              F.when(F.col('Region') == 'US', F.col('Sales') * 0.05).otherwise(
                F.when(F.col('Region') == 'IN', F.col('Sales') * 0.04).otherwise(
                    F.when(F.col('Region').isin('AU', 'NZ'), F.col('Sales') * 0.04).otherwise(
                        F.col('Sales'))))).show()

+-----+------+---------+
|Sales|Region|Commision|
+-----+------+---------+
| 5000|    US|    250.0|
| 2500|    IN|    100.0|
| 4500|    AU|    180.0|
| 4500|    NZ|    180.0|
+-----+------+---------+

Upvotes: 4

Related Questions