Reputation: 30035
How can i achieve below with multiple when conditions.
from pyspark.sql import functions as F
df = spark.createDataFrame([(5000, 'US'),(2500, 'IN'),(4500, 'AU'),(4500, 'NZ')],["Sales", "Region"])
df.withColumn('Commision',
F.when(F.col('Region')=='US',F.col('Sales')*0.05).\
F.when(F.col('Region')=='IN',F.col('Sales')*0.04).\
F.when(F.col('Region')in ('AU','NZ'),F.col('Sales')*0.04).\
otherwise(F.col('Sales'))).show()
Upvotes: 3
Views: 14326
Reputation: 31510
I think you are missing .isin
in when condition and Use only F.when
for first when condition only (or) use .when
.
from pyspark.sql import functions as F
df = spark.createDataFrame([(5000, 'US'),(2500, 'IN'),(4500, 'AU'),(4500, 'NZ')],["Sales", "Region"])
df.withColumn('Commision',
F.when(F.col('Region')=='US',F.col('Sales')*0.05).\
when(F.col('Region')=='IN',F.col('Sales')*0.04).\
when(F.col('Region').isin ('AU','NZ'),F.col('Sales')*0.04).\
otherwise(F.col('Sales'))).show()
#+-----+------+---------+
#|Sales|Region|Commision|
#+-----+------+---------+
#| 5000| US| 250.0|
#| 2500| IN| 100.0|
#| 4500| AU| 180.0|
#| 4500| NZ| 180.0|
#+-----+------+---------+
Upvotes: 2
Reputation: 215057
Use otherwise
after when
:
df.withColumn('Commision',
F.when(F.col('Region') == 'US', F.col('Sales') * 0.05).otherwise(
F.when(F.col('Region') == 'IN', F.col('Sales') * 0.04).otherwise(
F.when(F.col('Region').isin('AU', 'NZ'), F.col('Sales') * 0.04).otherwise(
F.col('Sales'))))).show()
+-----+------+---------+
|Sales|Region|Commision|
+-----+------+---------+
| 5000| US| 250.0|
| 2500| IN| 100.0|
| 4500| AU| 180.0|
| 4500| NZ| 180.0|
+-----+------+---------+
Upvotes: 4