Reputation: 353
I have column "students" with following schema
root
|-- t1: integer (nullable = true)
|-- t2: integer (nullable = true)
|-- StudentsInfo: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- rollNumber: integer (nullable = true)
| | |-- complaints: map (nullable = true)
| | | |-- key: string
| | | |-- value: struct (valueContainsNull = true)
| | | | |-- severityOfComplaintX: integer (nullable = true)
| | | | |-- numInstancesofComplaintX: integer (nullable = true)
I want to transform this "studentInfo" column into two derived columns
I am deriving following two columns (each of type "Map"): "compaintSeverityOfComplaintX" "compaintNumInstancesofComplaintX".
Here understanding query mayn't be important. Its some working query that derives two columns(type: Map) from column of type "Students"
But, problem is when column ("studentInfo") value is NULL. It skips whole row (as expected).
I want to update my SQL query so that when value of "studentInfo" column for rowX is NULL, it should add empty MAP as value for derived columns "compaintSeverityOfComplaintX" and "compaintNumInstancesofComplaintX"
Whats better to handle null values here ? Like
For row-i:
when "students" == null:
set newly derived column compaintSeverityOfComplaintX = empty Map
set newly derived column compaintNumInstancesofComplaintX = empty Map
else
run above SQL to set proper values for newly derived columns compaintSeverityOfComplaintX and compaintNumInstancesofComplaintX
Update: I tried adding dummy studentInfo but it gives error
withColumn("students", when($"students".isNull, typedLit(Seq.empty[Any])).otherwise($"students"))
Error: java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Nil$ List()
Upvotes: 0
Views: 1656
Reputation: 2108
Let's say for example that you know the type of the new derived column, which in you case is Map[K,V].
You can try something like this
val derivedColumn = joinMap(col("severityOfComplaintXMapList"))
dataframe.withColumn("compaintSeverityOfComplaintX", when(col("students").isNull, typeLit[Map[String, Int]](Map.empty[String, Int]))).otherwise(derivedColumn)
Upvotes: 1