user3316676
user3316676

Reputation: 109

sortBy a column of RDD in Spark

I have an RDD points of type [(Double, Double)] and I have to iteratively sort it with respect to each column. The column with which to sort is stored in variable 'axis' and is evaluated as 0 or 1 based on whether the RDD is to be sorted by the 1st column or 2nd one. I tried the following but none of it seems to work:

    val sorted = points.sortBy(p => p._(axis))

or,

    val sorted = points.sortBy(_(axis))

I get the following error: Error:(18, 39) (Double, Double) does not take parameters Error occurred in an application involving default arguments.

Any help in this regard would be appreciated. Thanks!

Upvotes: 2

Views: 2484

Answers (2)

SCouto
SCouto

Reputation: 7926

You can use the productElement method to access dynamically to an element of a tuple.

The only problem is that this method returns an Any, so you need to convert it to Double (and to do so, you need first to convert Any to String)

Try this:

points.sortBy(_.productElement(axis).toString.toDouble)

EXAMPLE

input

points.foreach(println)
(0,1)
(1,0)

AXIS = 1

scala> val axis= 1
axis: Int = 1

scala> points.sortBy(_.productElement(axis).toString.toDouble)
res19: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[16] at sortBy at <console>:28

scala> res19.foreach(println)
(1,0)
(0,1)

AXIS = 0

scala> val axis= 0
axis: Int = 0

scala> points.sortBy(_.productElement(axis).toString.toDouble)
res24: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[26] at sortBy at <console>:28

scala> res24.foreach(println)
(0,1)
(1,0)

Upvotes: 1

Mahmoud Hanafy
Mahmoud Hanafy

Reputation: 1897

You can do it this way:

  def sortValue(axis: Int)(p: (Double, Double)) = if (axis == 0) p._1 else p._2

  val sorted = points.sortBy(p => sortValue(axis)(p))

Upvotes: 1

Related Questions