Piero Zito
Piero Zito

Reputation: 67

Perform a function on all couples of elements of my RDD

I will try to ask the problem in the general way.

I have a function like this

 myFunction (Object first, Object second)

And i have an rdd of Object RDD [Object].

I need to perform myFunction on all rdd's elements, in the end of process I have to be sure that all the couples of my object are performed with the myfunction (.., ..)

One way, maybe, is create a broadcast variable (as a copy of my RDD), and than

 val broadcastVar = sc.broadcast(rdd.collect())
 rdd_line.mapPartitions(p=> {
   var brd = broadcastVar.value
   var result = new ListBuffer[Double]()
   brd.foreach(b => {
     p.foreach(e => result+= myfunction(b ,e))
   })
   result.toList.toIterator
 })

There is another way to do this with better performance?

Upvotes: 0

Views: 62

Answers (1)

Joe K
Joe K

Reputation: 18434

Use RDD's .cartesian method to get an RDD containing all pairs of elements from the two. In this case, you want the RDD's cartesian with itself:

rdd.cartesian(rdd).map({ case (x, y) => myFunction(x, y) })

Note that this will include pairs of an element with itself, and pairs in both orders, i.e. (a, b) as well as (b, a). And (a, a).

Upvotes: 2

Related Questions