S.Kang
S.Kang

Reputation: 611

how to ascending sort a multiple array of SPARK RDD by any column in scala?

I'm interested in apache SPARK.

I tried to ascending sort a multiple array of SPARK RDD by any column in scala.

(i.e. RDD[Array[Int] -> Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))

If I sort by first column, then result will be Array(Array(1,2,3), Array(1,2,1), Array(2,3,4)). or If I sort by third column, then result will be Array(Array(1,2,3), Array(1,2,3), Array(2,3,4)). ) and then, I want to get RDD[Array[Int]] return-type value. Is there a method to solve it, whether using map() or filter() function?

Upvotes: 0

Views: 1836

Answers (2)

Balaji Reddy
Balaji Reddy

Reputation: 5710

val baseRdd = sc.parallelize(Array(Array(1, 2, 3), Array(2, 3, 4), Array(1, 2, 1)))

//False specifies desending order 
val result = baseRdd.sortBy(x => x(1), false)

result.foreach { x => println(x(0) + "\t" + x(1) + "\t" + x(2)) }

Result

2 3 4

1 2 3

1 2 1

Upvotes: 0

Tzach Zohar
Tzach Zohar

Reputation: 37852

Use RDD.sortBy:

// sorting by second column (index = 1)
val result: RDD[Array[Int]] = rdd.sortBy(_(1), ascending = true)

The sorting function can also be written using Pattern Matching:

val result: RDD[Array[Int]] = rdd.sortBy( {
  case Array(a, b, c) => b /* choose column(s) to sort by */
}, ascending = true)

Also note the ascending argument's default value is true, so you can drop it and get the same result:

val result: RDD[Array[Int]] = rdd.sortBy(_(1))

Upvotes: 2

Related Questions