DroppingOff
DroppingOff

Reputation: 331

Spark SQL not recognizing null values after split

I have similar data and issue to the questions asked here: Spark sql how to explode without losing null values

I have used the solution proposed for Spark <=2.1 and indeed the null values appaear as literals in my data after the split:

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(lit(null).cast("string")))))

The issue is that after that I need to check if there are null values in that column and take an action in that case. Wehn I try to run my code, the nulls inserted as literals as recognized as string instead of null values.

So this code below will always return 0 even if the row has a null in that column:

df.withColumn("likes", f.when(col('likes').isNotNull(), 0).otherwise(2)).show()

+--------+------+
|likes   |origin|
+--------+------+
|    CARS|     0|
|    CARS|     0|
|    null|     0|
|    null|     0|

I use cloudera pyspark

Upvotes: 1

Views: 1330

Answers (2)

DroppingOff
DroppingOff

Reputation: 331

I actually found a way. In the otherwise have to write this:

.otherwise(array(lit(None).cast("string")))))

Upvotes: 0

user10512437
user10512437

Reputation: 26

You could hack this, by using an udf:

val empty = udf(() => null: String)

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(empty()))))

Upvotes: 1

Related Questions