Synesso
Synesso

Reputation: 38978

Intersection of RDDs based on predicate

I have

val foo: RDD[T]
val bar: RDD[T]
val f: (T, T) => Boolean

Without calling collect(), how can I find the members of foo for which there exists a member of bar that causes f to be true?

Specifically, in my case, T is a 2D-line and f checks geometric intersections. I'm looking for all foo lines that cross any bar line.

Upvotes: 1

Views: 135

Answers (1)

Radu Ionescu
Radu Ionescu

Reputation: 3532

val intersections = foo.cartesian(bar).filter{ case (a,b) =>f(a,b) }.map(_._1).unique()

But this will be very slow.

An alternative solution based on logic: set of "intersects at least one" = total_set - set of "intersects none". Transforming thus the problem into finding foo that do not intersect any bar. I am going to write this solution in pseudocode.

while condition
      select randomly one bar
      collect foo's that intersect bar in remaining_foos
      update remaining_foos by filtering on previously collected foo's
apply above solution looking for opposite relation

In the implementation condition can be a fixed number of steps and/or reducing the size of remaining_foos to an acceptable level.

Use broadcast too distribute collected foo's

Upvotes: 1

Related Questions