aster2590
aster2590

Reputation: 41

Avoid RDD nested in Spark without Array

I've a big problem!

I have an RDD[(Int, Vector)] , where the Int is a sort of label.

For example :

(0, (a,b,c) );
(0, (d,e,f) );
(1, (g,h,i) )

etc...

Now, i need to use this RDD(I call it myrdd ) like this :

myrdd.map{  case(l,v) => 
   myrdd.map { case(l_, v_) => 
      compare(v, v_)
   }
}

Now, I know that it's impossible in spark to use RDD nested.

I can bypass the problem using an Array. But for my problem i can't use Array, or anything that goes in memory.

How could I resolve my problem WITHOUT USING ARRAY?

Thanks in advance!!!

Upvotes: 0

Views: 224

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67135

cartesian sounds like it should work:

myrdd.cartesian(myrdd).map{
  case ((_,v),(_,v_)) => compare(v,v_)
}

Upvotes: 2

Related Questions