Reputation: 611
I'm interested in apache SPARK.
I tried to ascending sort a multiple array of SPARK RDD by any column in scala.
(i.e. RDD[Array[Int] -> Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))
If I sort by first column, then result will be Array(Array(1,2,3), Array(1,2,1), Array(2,3,4)).
or If I sort by third column, then result will be Array(Array(1,2,3), Array(1,2,3), Array(2,3,4)).
)
and then, I want to get RDD[Array[Int]] return-type value.
Is there a method to solve it, whether using map()
or filter()
function?
Upvotes: 0
Views: 1836
Reputation: 5710
val baseRdd = sc.parallelize(Array(Array(1, 2, 3), Array(2, 3, 4), Array(1, 2, 1)))
//False specifies desending order
val result = baseRdd.sortBy(x => x(1), false)
result.foreach { x => println(x(0) + "\t" + x(1) + "\t" + x(2)) }
Result
2 3 4
1 2 3
1 2 1
Upvotes: 0
Reputation: 37852
Use RDD.sortBy
:
// sorting by second column (index = 1)
val result: RDD[Array[Int]] = rdd.sortBy(_(1), ascending = true)
The sorting function can also be written using Pattern Matching:
val result: RDD[Array[Int]] = rdd.sortBy( {
case Array(a, b, c) => b /* choose column(s) to sort by */
}, ascending = true)
Also note the ascending
argument's default value is true
, so you can drop it and get the same result:
val result: RDD[Array[Int]] = rdd.sortBy(_(1))
Upvotes: 2