MapReddy Usthili
MapReddy Usthili

Reputation: 288

Spark Iterating RDD over another RDD with filter conditions Scala

I wants to iterate one BIG RDD with small RDD with some additional filter conditions . the below code is working fine but the process is running only with Driver and Not spread-ed across the nodes . So please suggest any other approach ?

val cross = titlesRDD.cartesian(brRDD).cache()
 val matching = cross.filter{ case( x, br) => 
    ((br._1 == "0") && 
   (((br._2 ==((x._4))) &&
    ((br._3 exists (x._5)) || ((br._3).head=="")) 
}

Thanks, madhu

Upvotes: 1

Views: 1055

Answers (1)

Jason Scott Lenderman
Jason Scott Lenderman

Reputation: 1918

You probably don't want to cache cross. Not caching it will, I believe, let the cartesian product happen "on the fly" as needed for the filter, instead of instantiating the potentially large number of combinations resulting from the cartesian product in memory.

Also, you can do brRDD.filter(_._1 == "0") before doing the cartesian product with titlesRDD, e.g.

val cross = titlesRDD.cartesian(brRRD.filter(_._1 == "0"))

and then modify the filter used to create matching appropriately.

Upvotes: 3

Related Questions