Scala_Beginner
Scala_Beginner

Reputation: 131

If-If statement Scala Spark

I have a dataframe for which I have to create a new column based on values in the already existing columns. The catch is, I can't write CASE statements, because here it checks for first WHEN condition if it is not satisfied then it will go to next WHEN. E.g. consider this dataframe:

+-+-----+-+
|A|B    |C|
+-+-----+-+
|1|true |1|-----> Condition 1 and 2 is satisfied Here
|1|true |0|-----> Condition 1 is satisfied here
|1|false|1|
|2|true |1|
|2|true |0|
+-+-----+-+

Consider this CASE statement:

CASE WHEN A = 1 and  B = 'true' then 'A' 
WHEN A = 1 and  B = 'true' and C=1 then 'B'
END

It gives me no row for value B.

Expected output:

+-+-----+-+----+
|A|B    |C|D   |
+-+-----+-+----+
|1|true |1|A   |
|1|true |1|B   |
|1|true |0|A   |
|1|false|1|null|
|2|true |1|null|
|2|true |0|null|
+-+-----+-+----+

I know I can derive this in 2 separate dataframes and then union them. But I am looking for more efficient solution.

Upvotes: 0

Views: 1226

Answers (1)

ZygD
ZygD

Reputation: 24386

Creating the dataframe:

val df1 = Seq((1, true, 1), (1, true, 0), (1, false, 1), (2, true,  1), (2, true,  0)).toDF("A", "B", "C")
df1.show()
//  +---+-----+---+
//  |  A|    B|  C|
//  +---+-----+---+
//  |  1| true|  1|
//  |  1| true|  0|
//  |  1|false|  1|
//  |  2| true|  1|
//  |  2| true|  0|
//  +---+-----+---+

The code:

val condition1 = ($"A" === 1) && ($"B" === true)
val condition2 = condition1 && ($"C" === 1)
val arr1 = array(when(condition1, "A"), when(condition2, "B"))
val arr2 = when(element_at(arr1, 2).isNull, slice(arr1, 1, 1)).otherwise(arr1)
val df2 = df.withColumn("D", explode(arr2))

df2.show()
//  +---+-----+---+----+
//  |  A|    B|  C|   D|
//  +---+-----+---+----+
//  |  1| true|  1|   A|
//  |  1| true|  1|   B|
//  |  1| true|  0|   A|
//  |  1|false|  1|null|
//  |  2| true|  1|null|
//  |  2| true|  0|null|
//  +---+-----+---+----+

Upvotes: 1

Related Questions