Reputation: 38978
I have
val foo: RDD[T]
val bar: RDD[T]
val f: (T, T) => Boolean
Without calling collect()
, how can I find the members of foo
for which there exists a member of bar
that causes f
to be true
?
Specifically, in my case, T
is a 2D-line and f
checks geometric intersections. I'm looking for all foo
lines that cross any bar
line.
Upvotes: 1
Views: 135
Reputation: 3532
val intersections = foo.cartesian(bar).filter{ case (a,b) =>f(a,b) }.map(_._1).unique()
But this will be very slow.
An alternative solution based on logic: set of "intersects at least one" = total_set - set of "intersects none". Transforming thus the problem into finding foo
that do not intersect any bar
. I am going to write this solution in pseudocode.
while condition
select randomly one bar
collect foo's that intersect bar in remaining_foos
update remaining_foos by filtering on previously collected foo's
apply above solution looking for opposite relation
In the implementation condition can be a fixed number of steps and/or reducing the size of remaining_foos to an acceptable level.
Use broadcast too distribute collected foo's
Upvotes: 1