user1361815
user1361815

Reputation: 353

cannot split an array by a special char in spark

Hi i have this dataframe (runnerListByPositionDataframe) :

+------------+---------------------------------+
|runner      |positions                        |
+------------+---------------------------------+
|azerty      |[10, 8, 11,, 1, 5, 4, 1, 9, 7, 1]|
+------------+---------------------------------+

I'm trying to split the positions by a number. Ex i need to have :

+------------+----------------------------------------+
|runner      |positions                               |
+------------+----------------------------------------+
|azerty      |[[10, 8, 11,, 1] , [5, 4, 1], [9, 7, 1]]|
+------------+----------------------------------------+

Every 1 position i create a new array in order to have an array of arrays

To do that :

val result: Dataset[(Seq[Int], Seq[Int])] = runnerListByPositionDataframe.map((runner: Row) => {
  val positions: Seq[Int] = runner.getAs[Seq[Int]]("positions")
  val positionsSplited: (Seq[Int], Seq[Int]) = positions.splitAt(positions.indexWhere(x => {
    x == 0
  }))
  positionsSplited
})

result.show(false)

But i'm getting instead :

+-----------+-----------------------+
|_1         |_2                     |
+-----------+-----------------------+
|[10, 8, 11]|[, 1, 5, 4, 1, 9, 7, 1]|
+-----------+-----------------------+

Can someone help ?

thanks

Upvotes: 1

Views: 49

Answers (1)

Som
Som

Reputation: 6323

spark>=2.4

brute force approach that i can think of to achieve the required o/p

val df = Seq("azerty").toDF("runner")
      .withColumn("positions", expr("array(10, 8, 11, null, 1, 5, 4, 1, 9, 7, 1)"))

     df.withColumn("x",
      expr("TRANSFORM(split(replace(array_join(positions, '#', ''), '#1#' , '#1$'), '[$]')," +
        " x -> split(x, '[#]'))"))
      .show(false)

    /**
      * +------+---------------------------------+----------------------------------------+
      * |runner|positions                        |x                                       |
      * +------+---------------------------------+----------------------------------------+
      * |azerty|[10, 8, 11,, 1, 5, 4, 1, 9, 7, 1]|[[10, 8, 11, , 1], [5, 4, 1], [9, 7, 1]]|
      * +------+---------------------------------+----------------------------------------+
      * 
      */

Upvotes: 1

Related Questions