Reputation: 2611
I am a newbie to Spark Python. I am trying to convert to get combinations of list of values for a key. But I am stuck.
Lets Say my pairedRDD is(Key,List[]):
(a,[1,2,3])
(b,[2,3])
Now I need to convert this as (key, combinations(List[]))
(a,[1])
(a,[2])
(a,[3])
(a,[1,2])
(a,[1,3])
(a,[1,2,3])
.
.
.
I tried doing this, but failed:
def combis(l,n):
l = [item for sublist in l for item in sublist]
return combinations(l,n)
combiusershobby = hobbyusers.flatMap(lambda (a,b) : (a,combis(b,2))
Where combis takes two arguments, list and no of values in the combinations. I return the list of lists.
How can achieve this?
Upvotes: 0
Views: 737
Reputation: 330063
Plain and simple:
from functools import partial
from itertools import combinations
rdd = sc.parallelize([("a",[1,2,3]), ("b",[2,3])])
combs = rdd.flatMapValues(partial(combinations, r=2))
combs.take(3)
## [('a', (1, 2)), ('a', (1, 3)), ('a', (2, 3))]
or if you want all:
from itertools import chain
combs_one_to_n = rdd.flatMapValues(lambda vs: chain(*[
combinations(vs, i) for i in range(1, len(vs) + 1)]
))
combs_one_to_n.take(5)
## [('a', (1,)), ('a', (2,)), ('a', (3,)), ('a', (1, 2)), ('a', (1, 3))]
Upvotes: 3