jag
jag

Reputation: 93

PySpark RDD transformation

i have a RDD with a list of floats:

[1.0, 3.0, 4.0, 2.0]

and i want a transformed RDD like this:

[(1.0, 3.0), (1.0, 4.0), (1.0, 2.0), (3.0, 4.0), (3.0, 2.0), (4.0, 2.0)]

Any help is appreciated.

Upvotes: 0

Views: 123

Answers (1)

Daniel Darabos
Daniel Darabos

Reputation: 27455

You need RDD.cartesian.

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in self and b is in other.

>>> rdd = sc.parallelize([1, 2])
>>> sorted(rdd.cartesian(rdd).collect())
[(1, 1), (1, 2), (2, 1), (2, 2)]

Note that this returns the pairs in both directions. Hopefully this is not a problem for you.

Upvotes: 1

Related Questions