Jack Daniel
Jack Daniel

Reputation: 2611

Key,[list of values] to Key,[Combinations of values] in Spark

I am a newbie to Spark Python. I am trying to convert to get combinations of list of values for a key. But I am stuck.

Lets Say my pairedRDD is(Key,List[]):

(a,[1,2,3])
(b,[2,3])

Now I need to convert this as (key, combinations(List[]))

(a,[1])
(a,[2])
(a,[3])
(a,[1,2])
(a,[1,3])
(a,[1,2,3])
.
.
.

I tried doing this, but failed:

def combis(l,n):
  l = [item for sublist in l for item in sublist]
  return combinations(l,n)

combiusershobby = hobbyusers.flatMap(lambda (a,b) : (a,combis(b,2))

Where combis takes two arguments, list and no of values in the combinations. I return the list of lists.

How can achieve this?

Upvotes: 0

Views: 737

Answers (1)

zero323
zero323

Reputation: 330063

Plain and simple:

from functools import partial
from itertools import combinations

rdd = sc.parallelize([("a",[1,2,3]), ("b",[2,3])])
combs = rdd.flatMapValues(partial(combinations, r=2))

combs.take(3)
## [('a', (1, 2)), ('a', (1, 3)), ('a', (2, 3))]

or if you want all:

from itertools import chain 

combs_one_to_n = rdd.flatMapValues(lambda vs: chain(*[
    combinations(vs, i) for i in range(1, len(vs) + 1)]
))

combs_one_to_n.take(5)
## [('a', (1,)), ('a', (2,)), ('a', (3,)), ('a', (1, 2)), ('a', (1, 3))]

Upvotes: 3

Related Questions