Reputation: 842
I ask this question because i had to find one specific element on a RDD[key:Int,Array(Double)] where keys are unique. So it will be costly to use filter on the entire RDD whereas i just need one element which a know the key.
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey )
Thank you for your advices
Upvotes: 1
Views: 137
Reputation: 3354
Look the lookup function at PairRDDFunctions.scala.
def lookup(key: K): Seq[V]
Return the list of values in the RDD for key key. This operation is
done efficiently if the RDD has a known partitioner by only searching
the partition that the key maps to.
Example
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.keyBy(x => (_.length)
b.lookup(5)
res0: Seq[String] = WrappedArray(tiger, eagle)
Upvotes: 1
Reputation: 4515
All transformations are lazy and they are computed only when you call action on them. So you can just write:
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey ).first()
Upvotes: 1