DanJoe
DanJoe

Reputation: 3

Spark Graphx inDegrees Sorting - sortBy Vs sortWith

I am trying to sort the vertex list based on in-degrees in a Spark Graph (using Scala)

// Sort Ascending - both the 2 below yeild same results

gGraph.inDegrees.collect.sortBy(_._2).take(10)

gGraph.inDegrees.collect.sortWith(_._2 < _._2).take(10)

// Sort Decending 

gGraph.inDegrees.collect.sortWith(_._2 > _._2).take(10)

gGraph.inDegrees.collect.sortBy(_._2, ascending=false).take(10)     //Doesnt Work!!

I expect the results of sortBy(_._2, ascending=false) to be same as the sortWith(_._2>_._2) as mentioned above. But getting the below error. Appreciate any thoughts around this. Thanks!

scala> gGraph.inDegrees.collect.sortBy(_.2, ascending=false).take(10) :55: error: too many arguments for method sortBy: (f: ((org.apache.spark.graphx.VertexId, Int)) => B)(implicit ord: scala.math.Ordering[B])Array[(org.apache.spark.graphx.VertexId, Int)] gGraph.inDegrees.collect.sortBy(._2, ascending=false).take(10)

Upvotes: 0

Views: 543

Answers (1)

Joe K
Joe K

Reputation: 18424

Since you are doing .collect first, you are calling .sortBy on an Array, not on an RDD. Array's sortBy method takes only one parameter (you can't specify ascending).

You should usually let spark handle as much of the computation as possible, and only collect (or take) at the very end. Try this:

gGraph.inDegrees.sortBy(_._2, ascending=false).take(10)

Upvotes: 1

Related Questions