mzlo
mzlo

Reputation: 353

In Scala Spark, How to add default values for derived columns when source column is NULL?

I have column "students" with following schema

    root
 |-- t1: integer (nullable = true)
 |-- t2: integer (nullable = true)
 |-- StudentsInfo: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- rollNumber: integer (nullable = true)
 |    |    |-- complaints: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: struct (valueContainsNull = true)
 |    |    |    |    |-- severityOfComplaintX: integer (nullable = true)
 |    |    |    |    |-- numInstancesofComplaintX: integer (nullable = true)

I want to transform this "studentInfo" column into two derived columns

I am deriving following two columns (each of type "Map"): "compaintSeverityOfComplaintX" "compaintNumInstancesofComplaintX".

Here understanding query mayn't be important. Its some working query that derives two columns(type: Map) from column of type "Students"

But, problem is when column ("studentInfo") value is NULL. It skips whole row (as expected).

I want to update my SQL query so that when value of "studentInfo" column for rowX is NULL, it should add empty MAP as value for derived columns "compaintSeverityOfComplaintX" and "compaintNumInstancesofComplaintX"

Whats better to handle null values here ? Like

For row-i:
    when "students" == null:
       set newly derived column compaintSeverityOfComplaintX = empty Map
       set newly derived column compaintNumInstancesofComplaintX = empty Map
    else
       run above SQL to set proper values for newly derived columns compaintSeverityOfComplaintX and compaintNumInstancesofComplaintX

Update: I tried adding dummy studentInfo but it gives error


withColumn("students", when($"students".isNull, typedLit(Seq.empty[Any])).otherwise($"students"))

Error: java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Nil$ List()

Upvotes: 0

Views: 1656

Answers (1)

dumitru
dumitru

Reputation: 2108

Let's say for example that you know the type of the new derived column, which in you case is Map[K,V].

You can try something like this

val derivedColumn = joinMap(col("severityOfComplaintXMapList"))

dataframe.withColumn("compaintSeverityOfComplaintX", when(col("students").isNull, typeLit[Map[String, Int]](Map.empty[String, Int]))).otherwise(derivedColumn)

Upvotes: 1

Related Questions