aName
aName

Reputation: 3043

Spark how to merge two column based on a condition

I have a spark dataFrame which has 3 column, and I want to merge two of theme based on the 3rd one, here is an example :

+---+---+---+
|AAA|bbb|ccc|
+---+---+---+
|AAA|BBB|  E|
|AAA|BBB|  R|
|AAA|BBB|  E|
|AAA|BBB|  R|
|AAA|BBB|  R|
|AAA|BBB|  E|
+-----------+

I want to use the value of column AAA when the value of column CCC is E and to use BBB when CCC is R here is the output :

+---+---+
|NEW|ccc|
+---+---+
|AAA|  E|
|BBB|  R|
|AAA|  E|
|BBB|  R|
|BBB|  R|
|AAA|  E|
+-------+

Upvotes: 1

Views: 2000

Answers (3)

Mahesh Gupta
Mahesh Gupta

Reputation: 1892

Using spark Scala

with when and otherwise we not need to used again when if we have only two cases with respect to column.

var df=spark.createDataFrame(Seq(("AAA","BBB","E"),("AAA","BBB","R"),("AAA","BBB","E"),("AAA","BBB","R"),("AAA","BBB","R"),("AAA","BBB","E"))).toDF("AAA","bbb","ccc")
df.withColumn("New",when(col("CCC").equalTo("E"),col("AAA")).otherwise(col("BBB"))).show

Please find attached screenshot for same.

enter image description here

Upvotes: 0

dassum
dassum

Reputation: 5093

using SparkSql

SELECT
CASE
    WHEN CCC='E' THEN AAA    
    ELSE BBB
END AS new,CCC
FROM dataset;

Upvotes: 0

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

This can be done using when. (PySpark solution shown below)

from pyspark.sql.functions import when
df.withColumn('New',when(df.ccc=='E',df.aaa).when(df.ccc=='R',df.bbb)).show()

Upvotes: 5

Related Questions