Reputation: 353
Hi i have this dataframe (runnerListByPositionDataframe) :
+------------+---------------------------------+
|runner |positions |
+------------+---------------------------------+
|azerty |[10, 8, 11,, 1, 5, 4, 1, 9, 7, 1]|
+------------+---------------------------------+
I'm trying to split the positions by a number. Ex i need to have :
+------------+----------------------------------------+
|runner |positions |
+------------+----------------------------------------+
|azerty |[[10, 8, 11,, 1] , [5, 4, 1], [9, 7, 1]]|
+------------+----------------------------------------+
Every 1 position i create a new array in order to have an array of arrays
To do that :
val result: Dataset[(Seq[Int], Seq[Int])] = runnerListByPositionDataframe.map((runner: Row) => {
val positions: Seq[Int] = runner.getAs[Seq[Int]]("positions")
val positionsSplited: (Seq[Int], Seq[Int]) = positions.splitAt(positions.indexWhere(x => {
x == 0
}))
positionsSplited
})
result.show(false)
But i'm getting instead :
+-----------+-----------------------+
|_1 |_2 |
+-----------+-----------------------+
|[10, 8, 11]|[, 1, 5, 4, 1, 9, 7, 1]|
+-----------+-----------------------+
Can someone help ?
thanks
Upvotes: 1
Views: 49
Reputation: 6323
spark>=2.4
val df = Seq("azerty").toDF("runner")
.withColumn("positions", expr("array(10, 8, 11, null, 1, 5, 4, 1, 9, 7, 1)"))
df.withColumn("x",
expr("TRANSFORM(split(replace(array_join(positions, '#', ''), '#1#' , '#1$'), '[$]')," +
" x -> split(x, '[#]'))"))
.show(false)
/**
* +------+---------------------------------+----------------------------------------+
* |runner|positions |x |
* +------+---------------------------------+----------------------------------------+
* |azerty|[10, 8, 11,, 1, 5, 4, 1, 9, 7, 1]|[[10, 8, 11, , 1], [5, 4, 1], [9, 7, 1]]|
* +------+---------------------------------+----------------------------------------+
*
*/
Upvotes: 1