Reputation: 5813
I have a schema where each row contains multiple columns of arrays, and I want to explode each array column independently of each other one.
Suppose we have the columns:
**userId someString varA varB someBool
1 "example1" [0,2,5] [1,2,9] true
2 "example2" [1,20,5] [9,null,6] false
I want an output of:
userId someString varA varB someBool
1 "example1" 0 null true
1 "example1" 2 null true
1 "example1" 5 null true
1 "example1" 1 null true
1 "example1" 20 null true
1 "example1" 5 null true
2 "example2" null 1 false
2 "example2" null 2 false
2 "example2" null 9 false
2 "example2" null 9 false
2 "example2" null null false
2 "example2" null 6 false
Ideas?
(Oh, and I'm trying to this generically so I don't have to update the code as the schema changes, and also because the actual schema is kinda large...)
PS - Props to this very similar but different question from which I shamelessly stole the example data.
Edit: @oliik with a win, but, it would ALSO be awesome to see a way to this with df.flatMap
(mostly because I still don't grok flatMap
)
Upvotes: 2
Views: 332
Reputation: 4540
You can always generate the select programmatically
val df = Seq(
(1, "example1", Seq(0,2,5), Seq(Some(1),Some(2),Some(9)), true),
(2, "example2", Seq(1,20,5), Seq(Some(9),Option.empty[Int],Some(6)), false)
).toDF("userId", "someString", "varA", "varB", "someBool")
val arrayColumns = df.schema.fields.collect {
case StructField(name, ArrayType(_, _), _, _) => name
}
val dfs = arrayColumns.map { expname =>
val columns = df.schema.fields.map {
case StructField(name, ArrayType(_, _), _, _) if expname == name => explode(df.col(name)) as name
case StructField(name, ArrayType(_, _), _, _) => lit(null) as name
case StructField(name, _, _, _) => df.col(name)
}
df.select(columns:_*)
}
dfs.reduce(_ union _).show()
+------+----------+----+----+--------+
|userId|someString|varA|varB|someBool|
+------+----------+----+----+--------+
| 1| example1| 0|null| true|
| 1| example1| 2|null| true|
| 1| example1| 5|null| true|
| 2| example2| 1|null| false|
| 2| example2| 20|null| false|
| 2| example2| 5|null| false|
| 1| example1|null| 1| true|
| 1| example1|null| 2| true|
| 1| example1|null| 9| true|
| 2| example2|null| 9| false|
| 2| example2|null|null| false|
| 2| example2|null| 6| false|
+------+----------+----+----+--------+
Upvotes: 6