Eddie
Eddie

Reputation: 31

Concatenate multiple columns with if in spark scala

I have to concatenate multiple columns with a condition in spark Scala, but it doesn't work with the "if". I have following DataFrame:

table:
+---+----+----+  
|  a|   b|   c|
+---+----+----+  
|  0|   1|   1|
|  1|   0|   0|
|  1|   1|   0|
+---+----+----+

table.withColumn("concat", concat_ws(", ", (if($"a"===1){lit("D")} else{null}),
                                           (if($"b"===1){lit("E")} else{null}),
                                           (if($"c"===1){lit("F")} else{null})))

Below is the final required result.

+---+----+----+------+ 
|  a|   b|   c|concat|
+---+----+----+------+  
|  0|   1|   1|  E, F|
|  1|   0|   0|     D|
|  1|   1|   0|  D, E|
+---+----+----+------+

I wouldn't create other columns like this:

val ftable = (table.withColumn("D", when ($"a"===1, lit("D")))
                    .withColumn("E", when ($"b"===1, lit("E")))
                    .withColumn("F", when ($"c"===1, lit("F"))))                                 

val columnselection = ftable.select($"D", $"E" , $"F" )
val selection = columnselection.columns.map(col)
val animaliCol = ftable.select(ftable.col("*"), concat_ws(", ", selection : _*).as("concat"))

Upvotes: 1

Views: 1450

Answers (2)

s.polam
s.polam

Reputation: 10382

Used concat_ws , when & otherwise functions.

scala> df.show(false)
+---+---+---+
|a  |b  |c  |
+---+---+---+
|0  |1  |1  |
|1  |0  |0  |
|1  |1  |0  |
+---+---+---+

Creating required columns & its default values.

scala> val expressions = concat_ws(", ",Seq(("a","D"),("b","E"),("c","F")).map(c => when(col(c._1) === 1, lit(c._2)).otherwise(null)):_*) // Create column names & its default values expressions.
expressions: org.apache.spark.sql.Column = concat_ws(, , CASE WHEN (a = 1) THEN D ELSE NULL END, CASE WHEN (b = 1) THEN E ELSE NULL END, CASE WHEN (c = 1) THEN F ELSE NULL END)

Final output

scala> df.withColumn("concat",expressions).show(false)
+---+---+---+------+
|a  |b  |c  |concat|
+---+---+---+------+
|0  |1  |1  |E, F  |
|1  |0  |0  |D     |
|1  |1  |0  |D, E  |
+---+---+---+------+

Upvotes: 0

chlebek
chlebek

Reputation: 2451

You should replace if with when and otherwise.

 import org.apache.spark.sql.functions._

 table.withColumn("concat", concat_ws(", ",
      when($"a"===1,lit("D")).otherwise(null),
      when($"b"===1,lit("E")).otherwise(null),
      when($"c"===1,lit("F")).otherwise(null)
    )).show()

output:

+---+---+---+------+
|  a|  b|  c|concat|
+---+---+---+------+
|  0|  1|  1|  E, F|
|  1|  0|  0|     D|
|  1|  1|  0|  D, E|
+---+---+---+------+

Upvotes: 1

Related Questions