Reputation: 47
I have a data as below. Which contains, Product ID, Making cost, tax 1, tax 2, and MRP. So i have to create new Column deciding selling price of the item. Selling price show follow below condition
Example MC+TX1=70, MC+TX2=80, MRP = 100, then Selling price = 80
if MC+TX1 =170, MC+TX2 =80, MRP =100, then selling price = 80, since one of the MC+tx1 or MX+tx2 is smaller than MRP, then we will take that
MC+TX1=170, MC+TX2=180, MRP = 100, then Selling price = 100
+------------+-------------+-------+-------+-----+--+
| Product ID | Making Cost | Tax 1 | Tax 2 | MRP | |
+------------+-------------+-------+-------+-----+--+
| 12345 | 50 | 20 | 30 | 100 | |
| 23456 | 50 | 60 | 30 | 100 | |
| 34567 | 50 | 60 | 70 | 100 | |
+------------+-------------+-------+-------+-----+--+
Desired Output
| Product ID | Making Cost | Tax 1 | Tax 2 | MRP | Selling Price |
|------------|-------------|-------|-------|-----|---------------|
| 12345 | 50 | 20 | 30 | 100 | 80 |
| 23456 | 50 | 60 | 30 | 100 | 80 |
| 34567 | 50 | 60 | 70 | 100 | 100 |
Upvotes: 1
Views: 54
Reputation: 23119
You can use when for all the conditions as below
val df = spark.sparkContext.parallelize(Seq(
(12345, 50, 20, 30, 100),
(23456, 50, 60, 30, 100),
(34567, 50, 60, 70, 100)
))
.toDF("ID", "MC", "T1", "T2", "MRP")
df.withColumn("SP",
when((($"MC" + $"T1").gt($"MRP")) && (($"MC" + $"T2").gt($"MRP")), $"MRP")
.otherwise(
when((($"MC" + $"T1").lt($"MRP")) && (($"MC" + $"T2").lt($"MRP")), greatest(($"MC" + $"T1"), ($"MC" + $"T2")))
.otherwise(
when((($"MC" + $"T1").gt($"MRP")), $"MC" + $"T2")
.otherwise($"MC" + $"T1")
)
)
).show(false)
OUTPUT:
+-----+---+---+---+---+---+
|ID |MC |T1 |T2 |MRP|SP |
+-----+---+---+---+---+---+
|12345|50 |20 |30 |100|80 |
|23456|50 |60 |30 |100|80 |
|34567|50 |60 |70 |100|100|
+-----+---+---+---+---+---+
To make more efficient you can calculate all MC+T1
and M+T2
before so that you don't have to calculate every time and drop later if you don't want it.
Upvotes: 1
Reputation: 3470
You can do it with spark sql or with the dataframe API
df.withColumn("selling_price",
when((col("MC") + col("TX1") < col("MC") + col("TX2")) && (col("MC") + col("TX2") < col("MRP") , col("MC") + col("TX2"))
.when((col("MC") + col("TX1") > col("MRP")) && (col("MC") + col("TX2") > col("MRP") , col("MRP"))
.otherwise(col("MC") + col("TX1")))
Upvotes: 0