Reputation: 3
I am trying to sort the vertex list based on in-degrees in a Spark Graph (using Scala)
// Sort Ascending - both the 2 below yeild same results
gGraph.inDegrees.collect.sortBy(_._2).take(10)
gGraph.inDegrees.collect.sortWith(_._2 < _._2).take(10)
// Sort Decending
gGraph.inDegrees.collect.sortWith(_._2 > _._2).take(10)
gGraph.inDegrees.collect.sortBy(_._2, ascending=false).take(10) //Doesnt Work!!
I expect the results of sortBy(_._2, ascending=false)
to be same as the sortWith(_._2>_._2)
as mentioned above. But getting the below error. Appreciate any thoughts around this. Thanks!
scala> gGraph.inDegrees.collect.sortBy(_.2, ascending=false).take(10) :55: error: too many arguments for method sortBy: (f: ((org.apache.spark.graphx.VertexId, Int)) => B)(implicit ord: scala.math.Ordering[B])Array[(org.apache.spark.graphx.VertexId, Int)] gGraph.inDegrees.collect.sortBy(._2, ascending=false).take(10)
Upvotes: 0
Views: 543
Reputation: 18424
Since you are doing .collect
first, you are calling .sortBy
on an Array
, not on an RDD
. Array
's sortBy
method takes only one parameter (you can't specify ascending
).
You should usually let spark handle as much of the computation as possible, and only collect
(or take
) at the very end. Try this:
gGraph.inDegrees.sortBy(_._2, ascending=false).take(10)
Upvotes: 1