Reputation: 1267
I have a spark SQL question Id appreciate some guidance on the best way to do a conditional select from nested array of structs.
I have an example json document below
```
{
"id":"p1",
"externalIds":[
{"system":"a","id":"1"},
{"system":"b","id":"2"},
{"system":"c","id":"3"}
]
}
```
In spark SQL I want to select the "id" of one of the array structs based on some conditional logic.
e.g for above, select the id field of array sub element that has "system" = "b", namely the id of "2".
How best to do this in SparkSQL?
Cheers and thanks!
Upvotes: 2
Views: 1484
Reputation: 27373
Using an UDF, this could look like this, given a Dataframe (all attributes of type String):
+---+---------------------+
|id |externalIds |
+---+---------------------+
|p1 |[[a,1], [b,2], [c,3]]|
+---+---------------------+
Define an UDF to traverse your array and find the desired element:
def getExternal(system: String) = {
udf((row: Seq[Row]) =>
row.map(r => (r.getString(0), r.getString(1)))
.find { case (s, _) => s == system}
.map(_._2)
.orElse(None)
)
}
and use it like this:
df
.withColumn("external",getExternal("b")($"externalIds"))
.show(false)
+---+---------------------+--------+
|id |externalIds |external|
+---+---------------------+--------+
|p1 |[[a,1], [b,2], [c,3]]|2 |
+---+---------------------+--------+
Upvotes: 2