Reputation: 3043
I have a spark dataFrame which has 3 column, and I want to merge two of theme based on the 3rd one, here is an example :
+---+---+---+
|AAA|bbb|ccc|
+---+---+---+
|AAA|BBB| E|
|AAA|BBB| R|
|AAA|BBB| E|
|AAA|BBB| R|
|AAA|BBB| R|
|AAA|BBB| E|
+-----------+
I want to use the value of column AAA when the value of column CCC is E and to use BBB when CCC is R here is the output :
+---+---+
|NEW|ccc|
+---+---+
|AAA| E|
|BBB| R|
|AAA| E|
|BBB| R|
|BBB| R|
|AAA| E|
+-------+
Upvotes: 1
Views: 2000
Reputation: 1892
Using spark Scala
with when and otherwise we not need to used again when if we have only two cases with respect to column.
var df=spark.createDataFrame(Seq(("AAA","BBB","E"),("AAA","BBB","R"),("AAA","BBB","E"),("AAA","BBB","R"),("AAA","BBB","R"),("AAA","BBB","E"))).toDF("AAA","bbb","ccc")
df.withColumn("New",when(col("CCC").equalTo("E"),col("AAA")).otherwise(col("BBB"))).show
Please find attached screenshot for same.
Upvotes: 0
Reputation: 5093
using SparkSql
SELECT
CASE
WHEN CCC='E' THEN AAA
ELSE BBB
END AS new,CCC
FROM dataset;
Upvotes: 0
Reputation: 49260
This can be done using when
. (PySpark solution shown below)
from pyspark.sql.functions import when
df.withColumn('New',when(df.ccc=='E',df.aaa).when(df.ccc=='R',df.bbb)).show()
Upvotes: 5