Reputation: 288
I wants to iterate one BIG RDD with small RDD with some additional filter conditions . the below code is working fine but the process is running only with Driver and Not spread-ed across the nodes . So please suggest any other approach ?
val cross = titlesRDD.cartesian(brRDD).cache()
val matching = cross.filter{ case( x, br) =>
((br._1 == "0") &&
(((br._2 ==((x._4))) &&
((br._3 exists (x._5)) || ((br._3).head==""))
}
Thanks, madhu
Upvotes: 1
Views: 1055
Reputation: 1918
You probably don't want to cache cross
. Not caching it will, I believe, let the cartesian product happen "on the fly" as needed for the filter, instead of instantiating the potentially large number of combinations resulting from the cartesian product in memory.
Also, you can do brRDD.filter(_._1 == "0")
before doing the cartesian product with titlesRDD
, e.g.
val cross = titlesRDD.cartesian(brRRD.filter(_._1 == "0"))
and then modify the filter used to create matching
appropriately.
Upvotes: 3