Reputation: 331
I have similar data and issue to the questions asked here: Spark sql how to explode without losing null values
I have used the solution proposed for Spark <=2.1 and indeed the null values appaear as literals in my data after the split:
df.withColumn("likes", explode(
when(col("likes").isNotNull, col("likes"))
// If null explode an array<string> with a single null
.otherwise(array(lit(null).cast("string")))))
The issue is that after that I need to check if there are null values in that column and take an action in that case. Wehn I try to run my code, the nulls inserted as literals as recognized as string instead of null values.
So this code below will always return 0 even if the row has a null in that column:
df.withColumn("likes", f.when(col('likes').isNotNull(), 0).otherwise(2)).show()
+--------+------+
|likes |origin|
+--------+------+
| CARS| 0|
| CARS| 0|
| null| 0|
| null| 0|
I use cloudera pyspark
Upvotes: 1
Views: 1330
Reputation: 331
I actually found a way. In the otherwise have to write this:
.otherwise(array(lit(None).cast("string")))))
Upvotes: 0
Reputation: 26
You could hack this, by using an udf
:
val empty = udf(() => null: String)
df.withColumn("likes", explode(
when(col("likes").isNotNull, col("likes"))
// If null explode an array<string> with a single null
.otherwise(array(empty()))))
Upvotes: 1