What is the alternative and faster way to look up an element in an RDD

Question

I am new in Scala and Spark. This is a simple example of my whole code:

package trouble.something

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object Stack {
  def ExFunc2(looku: RDD[(Int, List[(Double, Int)])], ke: Int): Seq[List[(Double, Int)]] = {
    val y: Seq[List[(Double, Int)]] = looku.lookup(ke)
    val g = y.map{x =>
      x
      /* some functions here
      .
      .
       */
    }
    g
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[*]").setAppName("toy")
    val sc = new SparkContext(conf)

    val pi: RDD[(Int, List[(Double, Int)])] = sc.parallelize(Seq((1, List((9.0, 3), (7.0, 2))), (2, List((7.0, 1), (1.0, 3))), (3, List((1.0, 2), (9.0, 1)))))
    val res = ExFunc2(pi, 1)
    println(res)
  }
}

I am running a large enough data, and I need faster performance. By looking at Spark's web UI and a software profiler. The most consuming time is lookup() function:

 val y: Seq[List[(Double, Int)]] = looku.lookup(ke)

What is an alternative and way to lookup an element in an RDD rather than lookup() function?

There is a discussion related to this problem Spark: Fastest way to look up an element in an RDD. However, it does not give me any idea.

What is the alternative and faster way to look up an element in an RDD

Answers (1)

Related Questions