ulrich
ulrich

Reputation: 3587

Scala select n-th element of the n-th element of an array / RDD

I have the following RDD:

val a = List((3, 1.0), (2, 2.0), (4, 2.0), (1,0.0))
val rdd = sc.parallelize(a)

Ordering the tuple elements by their right hand component in ascending order I would like to:

  1. Pick the 2nd smallest result i.e. (3, 1.0)
  2. Select the left hand element i.e. 3

The following code does that but it is so ugly and inefficient that I was wondering if someone could suggest something better.

val b = ((rdd.takeOrdered(2).zipWithIndex.map{case (k,v) => (v,k)}).toList find {x => x._1 == 1}).map(x => x._2).map(x=> x._1)

Upvotes: 1

Views: 848

Answers (3)

VasiliNovikov
VasiliNovikov

Reputation: 10236

Sorry, I don't know spark, but maybe a standard method from Scala collections would work?:

rdd.sortBy {case (k,v) => v -> k}.apply(2)

Upvotes: 1

ulrich
ulrich

Reputation: 3587

val a = List((3, 1.0), (2, 2.0), (4, 2.0), (1,0.0))
val rdd = sc.parallelize(a)
rdd.map(_.swap).takeOrdered(2).max._2

Upvotes: 0

Jean Logeart
Jean Logeart

Reputation: 53819

Simply:

implicit val ordering = scala.math.Ordering.Tuple2[Double, Int]

rdd.map(_.swap).takeOrdered(2).max.map { case (k, v) => v }

Upvotes: 1

Related Questions