Reputation: 383
I am trying to understand if there is a default method available in Spark - scala to include empty strings in coalesce.
Ex- I have the below DF with me -
val df2=Seq(
("","1"),
("null","15_20")
)toDF("c1","c2")
+----+-----+
| c1| c2|
+----+-----+
| | 1|
|null|15_20|
+----+-----+
The below code will work only for NULL values. But I require the coalesce to work for empty strings as well.
df2.withColumn("FirstNonNullOrBlank",coalesce(col("c1"),col("c2")))show
+----+-----+-------------------+
| c1| c2|FirstNonNullOrBlank|
+----+-----+-------------------+
| | 1| |
|null|15_20| 15_20|
+----+-----+-------------------+
Expected Output -
+----+-----+-------------------+
| c1| c2|FirstNonNullOrBlank|
+----+-----+-------------------+
| | 1| 1 |
|null|15_20| 15_20|
+----+-----+-------------------+
What should be the best approach here ?
Upvotes: 2
Views: 3189
Reputation: 27373
you need a helper function to "nullify" these records :
def nullify(c: Column) = when(not (c==="" or c==="null"),c)
df2
.withColumn("FirstNonNullOrBlank", coalesce(
nullify(col("c1")),
nullify(col("c2")))
)
.show
+----+-----+-------------------+
| c1| c2|FirstNonNullOrBlank|
+----+-----+-------------------+
| | 1| 1|
|null|15_20| 15_20|
+----+-----+-------------------+
Upvotes: 2